Well, counting frequency isn't the best approach. For instance, if a field has 1,000 terms and 10 occurrences of your target, is that a better match than a field with 10 terms and 5 occurrences of your target?
This kind of thing is already taken into account with Lucene scoring, you might want to look at this: http://lucene.apache.org/java/2_4_0/scoring.html <http://lucene.apache.org/java/2_4_0/scoring.html>Best Erick On Thu, May 6, 2010 at 11:48 PM, manjula wijewickrema <manjul...@gmail.com>wrote: > Hi Erik, > > Thanks for the reply. What I want to do is, to identify key terms and key > phrases of a document according to their number of occurences in the > document. Output should be the highest freequency words and (two or three > word) phrases. For this purpose can I use Lucene? > > Thanks > Manjula > > On Thu, May 6, 2010 at 6:09 PM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > Terms are relatively easy, see TermFreqVector in the JavaDocs. > > > > Phrases aren't as easy, before you go there, though, what is the > > high-level problem you're trying to solve? Possibly this is an XY problem > > (see http://people.apache.org/~hossman/#xyproblem). > > > > Best > > Erick > > > > On Thu, May 6, 2010 at 6:39 AM, manjula wijewickrema < > manjul...@gmail.com > > >wrote: > > > > > Hi, > > > > > > I am new to Lucene. If I want to know the term or phrase frequency of > an > > > input document, will it be possible through Lucene? > > > > > > Thanks, > > > Manjula > > > > > >