Ways to store and search tens of billions of text document content in one lucene index

Ranganath B N Thu, 22 Jun 2017 23:26:05 -0700

Hi,

    Two ways I know to accomplish this is   1, append several text  documents 
(say 100)   along with the text document number using demarcators  for them  in 
 to one field of lucene document  and    search using SpanNearQuery for   
search terms  to retrieve the matching text  documents.  But if the matches is  
in the order of tens of thousands of lucene documents,  time taken will be more 
 because I think spanNearQuery  parses  the  concatenated text  documents  in a 
 field   to  find the position of the matching span.


2, Instead of concatenating different  text documents,   create one field for 
each text document (100 fields) and the corresponding text document numbers in 
another 100 fields      for a lucene document  and try to search the  "search 
terms"   using  DisjunctionmaxQuery  consisting of 100  Boolean queries  for  
each of the  100 text fields. Then use
Explanation object  to  find  the  matching  text  documents  from   hit 
documents. But again lets say   there are 10000 lucene document matches,   I 
need to execute 10,000 *100=10lakh explanation.ismatch() methods which again 
takes
Lot of time.

What strategies do you recommend  for this task  "Ways to store  and search  
tens of billions of  text document content in one lucene index"?  so that I can 
accomplish this in optimal time.

Sincerely,
Ranganath B. N.

Ways to store and search tens of billions of text document content in one lucene index

Reply via email to