RE: splitting docIds from a search by segment [SEC=UNOFFICIAL]

Stephen GRAY Wed, 30 Oct 2013 19:45:48 -0700

UNOFFICIAL

Hi Mike,


Thanks for the helpful response. I'll try them both and see if any performance 
imrpovement I get from the mre complicated method is worth the extra complexity.

Thanks,
Steve

-----Original Message-----
From: Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: Wednesday, 30 October 2013 9:57 PM
To: Lucene Users
Subject: Re: splitting docIds from a search by segment [SEC=UNOFFICIAL]

You should try MultiDocValues first; it's trivial to use and may not be 
horribly slow.

It must do a binary-search for every docID lookup.

And then if this is too slow, assuming you traverse the docIDs in order, you 
can use IndexReader.leaves() to get the sub-readers.  The docIDs are just 
"appended" from these sub-readers, so you'd walk your docIDs and also walk you 
sub-readers, moving to the next sub-reader once you have a docID that's beyond 
its end.  Each sub-reader spans AtomicReaderContext.docBase to docBase + 
AtomicReaderContext.reader.maxDoc().

Mike McCandless

http://blog.mikemccandless.com

On Wed, Oct 30, 2013 at 2:21 AM, Stephen GRAY <stephen.g...@immi.gov.au> wrote:
> UNOFFICIAL
> Hi everyone,
>
> I am trying to write an application that loops through 500,000 - 1,000,000 
> documents returned by a search and calculates some statistics using the value 
> in a stored field. Obviously this needs to be as fast as possible so I am 
> using a NumericDocValues field to store the value.
>
> What I don't know is how to get the NumericDocValues value for each docId 
> returned by the search. What I've been told to do in a previous thread was:
>
> 1.       Split the docIds according to the segment they belong to
>
> 2.       Get a per-segment NumericDocValues instance and use this to extract 
> the values
>
> Can someone tell me how to do 1 and 2? I don't know how to discover what 
> segment a given docId is in, or how to convert a segment into a 
> NumericDocValues array.
>
> By the way it's also been suggested that I just use 
> MultiDocValue.getNumericValues, but I gather that this will be much slower.
>
> I'd appreciate any help,
>
> Thanks,
> Steve
>
> UNOFFICIAL
>
>
> --------------------------------------------------------------------
> Important Notice: If you have received this email by mistake, please 
> advise the sender and delete the message and attachments immediately.
> This email, including attachments, may contain confidential, 
> sensitive, legally privileged and/or copyright information.  Any 
> review, retransmission, dissemination or other use of this information 
> by persons or entities other than the intended recipient is 
> prohibited.  DIAC respects your privacy and has obligations under the 
> Privacy Act 1988.  The official departmental privacy policy can be viewed on 
> the department's website at www.immi.gov.au.  See:
> http://www.immi.gov.au/functional/privacy.htm
>
>
> ---------------------------------------------------------------------
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


UNOFFICIAL

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: splitting docIds from a search by segment [SEC=UNOFFICIAL]

Reply via email to