Andrzej Bialecki wrote:
Shouldn't this be combined with a HitCollector that collects only the first-n matches? Otherwise we still need to scan the whole posting list...

Yes.  I was just posting the work-in-progress.

We will also need to estimate the total number of matches by extrapolating linearly from the maximum doc id processed. Finally, it is probably rather slow for large indexes, whose .fdt won't fit in memory. A simple way to improve that might be to use Similarity.floatToByte-encoded floats when sorting (e.g., the norm from an untokenized field) so that documents whose boosts are close are not re-ordered. I'll start work on these in the morning. (It is currently my middle-of-night.)

Doug

Reply via email to