Hi, Two ways I know to accomplish this is 1, append several text documents (say 100) along with the text document number using demarcators for them in to one field of lucene document and search using SpanNearQuery for search terms to retrieve the matching text documents. But if the matches is in the order of tens of thousands of lucene documents, time taken will be more because I think spanNearQuery parses the concatenated text documents in a field to find the position of the matching span.
2, Instead of concatenating different text documents, create one field for each text document (100 fields) and the corresponding text document numbers in another 100 fields for a lucene document and try to search the "search terms" using DisjunctionmaxQuery consisting of 100 Boolean queries for each of the 100 text fields. Then use Explanation object to find the matching text documents from hit documents. But again lets say there are 10000 lucene document matches, I need to execute 10,000 *100=10lakh explanation.ismatch() methods which again takes Lot of time. What strategies do you recommend for this task "Ways to store and search tens of billions of text document content in one lucene index"? so that I can accomplish this in optimal time. Sincerely, Ranganath B. N.