lucene-dev  

Re: Getting word count

Dmitry Serebrennikov
Fri, 19 Oct 2001 13:29:42 -0700

>
>
>>You cannot simply count the number of times the method 
>>collect() is called on your collector because some queries 
>>may result in 
>>the same document being selected more than once and so you'd 
>>end up with 
>>a double-count. (Can anyone confirm that this is the case?)
>>
>
>It should not be the case.  The collect() method should be called at most
>once per document.
>
>Doug
>
This is a good news! This would make counting that much more efficient. 
My main concern was in the BooleanScorer, and I just verified that I was 
worried needlessly - it maintains its own hashtable to avoid double 
counting. On a related issue, are there any guarantees about the order 
of document numbers in the calls to collect()?