On 26/04/2012 09:49, Uwe Schindler wrote:

There are possibilities to truncate those lists, but this is not implemented
in Lucene core. The main problem with Lucene's segmented index structure is,
that you cannot early exit, because the very last document in the posting
list could be one with a very high score. Techniques like index pruning or
index sorting can optimize all this, but need index preprocessing and loose
updatability of the index while in use.

The Top-N result collection is optimized in Lucene (item #3), to throw away
all collected documents, where the score is never be able to get into the
top-n priority queue (too small score, once the queue is full). But the
score has to be calculated.

This paper presents yet another option: keep a value that represents max. impact of postings in a block and then skip whole blocks if the max impact is too low to get docs within the block in the queue:

http://cis.poly.edu/suel/papers/bmw.pdf

Implementing this in Lucene would require changing the iterators so that they can take a threshold value, i.e. the current lowest score in the queue, so that nextDoc() and advance() could skip whole blocks when their max impact is lower than the current lowest score.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to