Hi,
 
I am trying to find a way to handle the wildcard queries in Lucene without going out 
of memory and have been having some problems with it.  
 
I have modified some parts in search part of Lucene to just keep only about 1000 terms 
in memory and write the rest of the terms to a file (this is done in the getQuery() 
method of MultiTermQuery.java, PrefixQuery.java, etc.).  
 
Then when we create scorer objects and collect scores for each clause in the score() 
method of the BooleanScorer.java, after all the clauses (that are in memory) are 
processed, then I continue reading from the file that I created earlier.  I read out 
each term from the file and create a TermQuery, then get the scorer object from this 
TermQuery and collect the score for it.
 
Then the bucketTable will do collectHits of everything.
 
I have tested out my changes with small indexes with about 2 terms in memory and about 
2 or 3 terms in the file, and it worked fine.
 
However, when I tried this out with bigger indexes (> 1 million docs) and with 1000 in 
memory and 972 in the file, I got into an infinite loop when doing 
bucketTable.collectHits().  I printed out the doc in each bucket and noticed that 
about half way through the bucket list, it started to have about 4 - 5 repeated docs 
in the rest of the list and there was no null at the end of the list to end it.
 
I have looked at everywhere and even tried to increase the bucket table size to be the 
sum of the number of terms in memory and number of terms in the file.  But that still 
did not work.
 
I would really appreciate any suggestions/ideas/help on this.
 
Thanks.
Javier

                
---------------------------------
Do you Yahoo!?
Read only the mail you want - Yahoo! Mail SpamGuard.

Reply via email to