Hi Stephen, On Thu, Oct 24, 2013 at 1:18 AM, Stephen GRAY <stephen.g...@immi.gov.au> wrote: > I actually need to loop through a large number of documents (50,000 - > 100,000) calculating a number of statistics (min, max, sum) so I really need > the most efficient/fastest solution available. It sounds like it would be > best to just store the data in a stored field.
I see. For that many documents, doc values are actually the right thing to use, sorry if I put you on the wrong track I was assuming you were only going to collect values from a few documents. In your case the best option would be to split your doc ids according to the segment they belong to, and then for each segment, get a per-segment NumericDocValues instance and aggregate your statistics. It is better than using MultiDocValues because MultiDocValues needs to binary-search for the appropriate segment for every document. -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org