Doug Cutting
Fri, 19 Oct 2001 12:30:55 -0700
> From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]] > > If you are referring to the number of documents containing a > particular > term, that is available from IndexReader.termDocs(Term t). > However, if > it is anything more complex than a single term (like a phrase or some > other query), I think the only way is to actually run a > search on this > query and get the length of the Hits object returned.
That's right. > Slightly more > efficient, but requiring a bit more work, is to create a HitCollector > that uses a BitVector (see org.apache.lucene.util.BitVector) > to mark off > documents that the searcher finds. Afterwards you can get the > count from > the bit vector. This will skip over sorting that is done in > the standard > HitCollector. You don't need the bit vector. You can just count the number of times that collect() is called. > You cannot simply count the number of times the method > collect() is called on your collector because some queries > may result in > the same document being selected more than once and so you'd > end up with > a double-count. (Can anyone confirm that this is the case?) It should not be the case. The collect() method should be called at most once per document. Doug