RE: splitting docIds from a search by segment [SEC=UNOFFICIAL]

Stephen GRAY Sun, 03 Nov 2013 14:38:58 -0800

UNOFFICIAL

That's what I did. You just pass searcher.search a very large value for max 
docs so you get them all, then iterate through the ScoreDoc[] array - the docId 
is in scoreDoc.doc.


Regards,
Steve

Stephen Gray
Java Developer
Border Midrange Systems Support
Department of Immigration and Border Protection
Phone: (02) 6223 9207
Mobile: 0419 885 959


-----Original Message-----
From: Kyle Judson [mailto:kvjud...@hotmail.com]
Sent: Sunday, 3 November 2013 12:37 AM
To: java-user@lucene.apache.org
Subject: Re: splitting docIds from a search by segment [SEC=UNOFFICIAL]

All,

Is the best way to get the docIDs in a case like this to use 
IndexSercher.search to get TopDocs and then get the ScoreDoc[] from 
TopDocs.scoreDocs?

Thanks

Kyle


On 10/30/13 4:56 AM, "Michael McCandless" <luc...@mikemccandless.com>
wrote:

>You should try MultiDocValues first; it's trivial to use and may not be 
>horribly slow.
>
>It must do a binary-search for every docID lookup.
>
>And then if this is too slow, assuming you traverse the docIDs in 
>order, you can use IndexReader.leaves() to get the sub-readers.  The 
>docIDs are just "appended" from these sub-readers, so you'd walk your 
>docIDs and also walk you sub-readers, moving to the next sub-reader 
>once you have a docID that's beyond its end.  Each sub-reader spans 
>AtomicReaderContext.docBase to docBase + 
>AtomicReaderContext.reader.maxDoc().
>
>Mike McCandless
>
>http://blog.mikemccandless.com
>
>On Wed, Oct 30, 2013 at 2:21 AM, Stephen GRAY 
><stephen.g...@immi.gov.au>
>wrote:
>> UNOFFICIAL
>> Hi everyone,
>>
>> I am trying to write an application that loops through 500,000 -
>>1,000,000 documents returned by a search and calculates some 
>>statistics using the value in a stored field. Obviously this needs to 
>>be as fast as possible so I am using a NumericDocValues field to store the 
>>value.
>>
>> What I don't know is how to get the NumericDocValues value for each 
>>docId returned by the search. What I've been told to do in a previous 
>>thread was:
>>
>> 1.       Split the docIds according to the segment they belong to
>>
>> 2.       Get a per-segment NumericDocValues instance and use this to
>>extract the values
>>
>> Can someone tell me how to do 1 and 2? I don't know how to discover 
>>what segment a given docId is in, or how to convert a segment into a 
>>NumericDocValues array.
>>
>> By the way it's also been suggested that I just use 
>>MultiDocValue.getNumericValues, but I gather that this will be much 
>>slower.
>>
>> I'd appreciate any help,
>>
>> Thanks,
>> Steve
>>
>> UNOFFICIAL
>>
>>
>> --------------------------------------------------------------------
>> Important Notice: If you have received this email by mistake, please 
>>advise  the sender and delete the message and attachments immediately.
>>This email,  including attachments, may contain confidential, 
>>sensitive, legally privileged  and/or copyright information.  Any 
>>review, retransmission, dissemination  or other use of this 
>>information by persons or entities other than the  intended recipient 
>>is prohibited.  DIAC respects your privacy and has  obligations under 
>>the Privacy Act 1988.  The official departmental privacy  policy can 
>>be viewed on the department's website at www.immi.gov.au.
>>See:
>> http://www.immi.gov.au/functional/privacy.htm
>>
>>
>> ---------------------------------------------------------------------
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>For additional commands, e-mail: java-user-h...@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


UNOFFICIAL

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: splitting docIds from a search by segment [SEC=UNOFFICIAL]

Reply via email to