Hi,

> ...I get around 3
> million hits. Each of the hits is processed and information from a certain 
> field is
> used.

Thats of course fine, but:

> After certain number of hits, somewhere around 1 million (not always the same
> number) I get OutOfMemory exception that looks like this:

You did not tell us *how* you get the hits. If you do something like 
Searcher.search(query, 1000000) that it can easily memory overflow (sooner or 
later, maybe on decompressing results maybe somewhere else). Lucene always 
collects "top-ranking" results and for doing that it uses a priority queue. 
With the above command (passing 1 million or more as number of top-ranking 
results, this will use insane amounts of memory). Like most full text search 
engines, Lucene is optimized for quickly getting the best results. The use-case 
of fetching *all* possible hits is not really the correct use case of a full 
text search engine (especially as hits that far at the end are in most cases no 
more relevant to your query).

To really collect all hits (but in arbitrary order, not sorted by relevance), 
write your own Collector implementation that collects the results and pass it 
to searcher. There are several code sample on this mailing list.

Another approach is to use the new "sortAfter" method, available in the next 
Lucene version (not yet released).

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to