Hello again, Thanks for your answer, Dmitry. Indeed, simple terms would be too easy ;-) I need also to know the number of occurences for exact phrases.
The problem is that I do not want to count the number of documents but the number of global occurences in the whole index. For example, I want to know how many time there is the exact phrase "personal computer" in all the documents of the index. Counting the hits is not appropriated for this. Thanks a lot Julien > If you are referring to the number of documents containing a particular > term, that is available from IndexReader.termDocs(Term t). However, if > it is anything more complex than a single term (like a phrase or some > other query), I think the only way is to actually run a search on this > query and get the length of the Hits object returned. Slightly more > efficient, but requiring a bit more work, is to create a HitCollector > that uses a BitVector (see org.apache.lucene.util.BitVector) to mark off > documents that the searcher finds. Afterwards you can get the count from > the bit vector. This will skip over sorting that is done in the standard > HitCollector. You cannot simply count the number of times the method > collect() is called on your collector because some queries may result in > the same document being selected more than once and so you'd end up with > a double-count. (Can anyone confirm that this is the case?) > > Nioche, Julien wrote: > > >Hello All, > > > >I'm trying to get a word count information for exact phrases, i-e to know > >how many times a given form occur in the index. Does anyone know how I can > >do this in a clean way? > > > >Does it recquire modifying the score() methods of the different Scorers? Or > >is this information already computed somewhere else? > > > >Thanks a lot for your help > > > >Julien Nioche > >