Hi,
Two ways I know to accomplish this is 1, append several text documents
(say 100) along with the text document number using demarcators for them in
to one field of lucene document and search using SpanNearQuery for
search terms to retrieve the matching text documents. But if the matches is
in the order of tens of thousands of lucene documents, time taken will be more
because I think spanNearQuery parses the concatenated text documents in a
field to find the position of the matching span.
2, Instead of concatenating different text documents, create one field for
each text document (100 fields) and the corresponding text document numbers in
another 100 fields for a lucene document and try to search the "search
terms" using DisjunctionmaxQuery consisting of 100 Boolean queries for
each of the 100 text fields. Then use
Explanation object to find the matching text documents from hit
documents. But again lets say there are 10000 lucene document matches, I
need to execute 10,000 *100=10lakh explanation.ismatch() methods which again
takes
Lot of time.
What strategies do you recommend for this task "Ways to store and search
tens of billions of text document content in one lucene index"? so that I can
accomplish this in optimal time.
Sincerely,
Ranganath B. N.