It's very strange that you see faster performance using MultiDocValues: that simply should not be the case. Can you share your per-segment code?
Also, it's rather inefficient to collect all hits by passing maxDoc as n to IndexSearcher.search; if you really just want the docIDs and you don't care about order it's better to make a custom Collector that simply appends the docID to an array/list. I believe Lucene in Action includes an example for this... (disclosure: I'm one of the authors). Mike McCandless http://blog.mikemccandless.com On Sun, Nov 3, 2013 at 5:37 PM, Stephen GRAY <stephen.g...@immi.gov.au> wrote: > UNOFFICIAL > > That's what I did. You just pass searcher.search a very large value for max > docs so you get them all, then iterate through the ScoreDoc[] array - the > docId is in scoreDoc.doc. > > Regards, > Steve > > Stephen Gray > Java Developer > Border Midrange Systems Support > Department of Immigration and Border Protection > Phone: (02) 6223 9207 > Mobile: 0419 885 959 > > > -----Original Message----- > From: Kyle Judson [mailto:kvjud...@hotmail.com] > Sent: Sunday, 3 November 2013 12:37 AM > To: java-user@lucene.apache.org > Subject: Re: splitting docIds from a search by segment [SEC=UNOFFICIAL] > > All, > > Is the best way to get the docIDs in a case like this to use > IndexSercher.search to get TopDocs and then get the ScoreDoc[] from > TopDocs.scoreDocs? > > Thanks > > Kyle > > > On 10/30/13 4:56 AM, "Michael McCandless" <luc...@mikemccandless.com> > wrote: > >>You should try MultiDocValues first; it's trivial to use and may not be >>horribly slow. >> >>It must do a binary-search for every docID lookup. >> >>And then if this is too slow, assuming you traverse the docIDs in >>order, you can use IndexReader.leaves() to get the sub-readers. The >>docIDs are just "appended" from these sub-readers, so you'd walk your >>docIDs and also walk you sub-readers, moving to the next sub-reader >>once you have a docID that's beyond its end. Each sub-reader spans >>AtomicReaderContext.docBase to docBase + >>AtomicReaderContext.reader.maxDoc(). >> >>Mike McCandless >> >>http://blog.mikemccandless.com >> >>On Wed, Oct 30, 2013 at 2:21 AM, Stephen GRAY >><stephen.g...@immi.gov.au> >>wrote: >>> UNOFFICIAL >>> Hi everyone, >>> >>> I am trying to write an application that loops through 500,000 - >>>1,000,000 documents returned by a search and calculates some >>>statistics using the value in a stored field. Obviously this needs to >>>be as fast as possible so I am using a NumericDocValues field to store the >>>value. >>> >>> What I don't know is how to get the NumericDocValues value for each >>>docId returned by the search. What I've been told to do in a previous >>>thread was: >>> >>> 1. Split the docIds according to the segment they belong to >>> >>> 2. Get a per-segment NumericDocValues instance and use this to >>>extract the values >>> >>> Can someone tell me how to do 1 and 2? I don't know how to discover >>>what segment a given docId is in, or how to convert a segment into a >>>NumericDocValues array. >>> >>> By the way it's also been suggested that I just use >>>MultiDocValue.getNumericValues, but I gather that this will be much >>>slower. >>> >>> I'd appreciate any help, >>> >>> Thanks, >>> Steve >>> >>> UNOFFICIAL >>> >>> >>> -------------------------------------------------------------------- >>> Important Notice: If you have received this email by mistake, please >>>advise the sender and delete the message and attachments immediately. >>>This email, including attachments, may contain confidential, >>>sensitive, legally privileged and/or copyright information. Any >>>review, retransmission, dissemination or other use of this >>>information by persons or entities other than the intended recipient >>>is prohibited. DIAC respects your privacy and has obligations under >>>the Privacy Act 1988. The official departmental privacy policy can >>>be viewed on the department's website at www.immi.gov.au. >>>See: >>> http://www.immi.gov.au/functional/privacy.htm >>> >>> >>> --------------------------------------------------------------------- >>> >> >>--------------------------------------------------------------------- >>To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > UNOFFICIAL > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org