UNOFFICIAL Hi Mike,
Thanks for the helpful response. I'll try them both and see if any performance imrpovement I get from the mre complicated method is worth the extra complexity. Thanks, Steve -----Original Message----- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, 30 October 2013 9:57 PM To: Lucene Users Subject: Re: splitting docIds from a search by segment [SEC=UNOFFICIAL] You should try MultiDocValues first; it's trivial to use and may not be horribly slow. It must do a binary-search for every docID lookup. And then if this is too slow, assuming you traverse the docIDs in order, you can use IndexReader.leaves() to get the sub-readers. The docIDs are just "appended" from these sub-readers, so you'd walk your docIDs and also walk you sub-readers, moving to the next sub-reader once you have a docID that's beyond its end. Each sub-reader spans AtomicReaderContext.docBase to docBase + AtomicReaderContext.reader.maxDoc(). Mike McCandless http://blog.mikemccandless.com On Wed, Oct 30, 2013 at 2:21 AM, Stephen GRAY <stephen.g...@immi.gov.au> wrote: > UNOFFICIAL > Hi everyone, > > I am trying to write an application that loops through 500,000 - 1,000,000 > documents returned by a search and calculates some statistics using the value > in a stored field. Obviously this needs to be as fast as possible so I am > using a NumericDocValues field to store the value. > > What I don't know is how to get the NumericDocValues value for each docId > returned by the search. What I've been told to do in a previous thread was: > > 1. Split the docIds according to the segment they belong to > > 2. Get a per-segment NumericDocValues instance and use this to extract > the values > > Can someone tell me how to do 1 and 2? I don't know how to discover what > segment a given docId is in, or how to convert a segment into a > NumericDocValues array. > > By the way it's also been suggested that I just use > MultiDocValue.getNumericValues, but I gather that this will be much slower. > > I'd appreciate any help, > > Thanks, > Steve > > UNOFFICIAL > > > -------------------------------------------------------------------- > Important Notice: If you have received this email by mistake, please > advise the sender and delete the message and attachments immediately. > This email, including attachments, may contain confidential, > sensitive, legally privileged and/or copyright information. Any > review, retransmission, dissemination or other use of this information > by persons or entities other than the intended recipient is > prohibited. DIAC respects your privacy and has obligations under the > Privacy Act 1988. The official departmental privacy policy can be viewed on > the department's website at www.immi.gov.au. See: > http://www.immi.gov.au/functional/privacy.htm > > > --------------------------------------------------------------------- > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org UNOFFICIAL --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org