Re: splitting docIds from a search by segment [SEC=UNOFFICIAL]

Michael McCandless Mon, 04 Nov 2013 02:50:23 -0800

On Sun, Nov 3, 2013 at 7:59 PM, Stephen GRAY <[email protected]> wrote:
> UNOFFICIAL
>
> Hi Mike,
>
> I ran it again and this time the two methods came out about the same: 168 - 
> 288 ms to process 173,000 documents for the walking method and 160 - 205 ms 
> for the MultiDocValues method . I don't know what was happening with my last 
> test.


Hmm, still curious.  But it could simply be that the per-doc binary
search is in the noise...

> Here is my code:

The code looks correct, but are you certain the hits come back in
docID order?  Are you sorting by (SortField.FIELD_DOC)?

> Thanks for the tip on using a custom Collector. This is in Lucene in Action 
> (great book by the way).

I'm glad to hear that, thanks!

Another option is to fold this processing (looking up the NDV value
for the doc and then doing something) into your Collector: it's
already told whenever it's switching to a new reader, so you'd lookup
your NDV instance there, and then in collect(int doc), do your
processing.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: splitting docIds from a search by segment [SEC=UNOFFICIAL]

Reply via email to