Re: Term/Phrase frequencies

Erick Erickson Fri, 07 May 2010 07:53:33 -0700

Well, counting frequency isn't the best approach. For instance, if a field
has 1,000 terms and 10 occurrences of your target, is that a better match
than a field with 10 terms and 5 occurrences of your target?


This kind of thing is already taken into account with Lucene scoring, you
might
want to look at this:
http://lucene.apache.org/java/2_4_0/scoring.html

<http://lucene.apache.org/java/2_4_0/scoring.html>Best
Erick

On Thu, May 6, 2010 at 11:48 PM, manjula wijewickrema
<manjul...@gmail.com>wrote:

> Hi Erik,
>
> Thanks for the reply. What I want to do is, to identify key terms and key
> phrases of a document according to their number of occurences in the
> document. Output should be the highest freequency words and (two or three
> word) phrases. For this purpose can I use Lucene?
>
> Thanks
> Manjula
>
> On Thu, May 6, 2010 at 6:09 PM, Erick Erickson <erickerick...@gmail.com
> >wrote:
>
> > Terms are relatively easy, see TermFreqVector in the JavaDocs.
> >
> > Phrases aren't as easy, before you go there, though, what is the
> > high-level problem you're trying to solve? Possibly this is an XY problem
> > (see http://people.apache.org/~hossman/#xyproblem).
> >
> > Best
> > Erick
> >
> > On Thu, May 6, 2010 at 6:39 AM, manjula wijewickrema <
> manjul...@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > > I am new to Lucene. If I want to know the term or phrase frequency of
> an
> > > input document, will it be possible through Lucene?
> > >
> > > Thanks,
> > > Manjula
> > >
> >
>

Re: Term/Phrase frequencies

Reply via email to