Okay great! Thanks for the quick response and pointing me in the right
direction. I'll go get out my Lucene in Action book ;) and learn all about
term vectors.
----- Original Message -----
From: "Grant Ingersoll" <[EMAIL PROTECTED]>
To: <java-user@lucene.apache.org>
Sent: Monday, January 09, 2006 12:34 PM
Subject: Re: top n words within a results set?
You could use term vectors to accomplish this. Get your hits for the
website, then load the term vector for the field containing the keywords
and add up the frequencies
Chris Brown wrote:
Hello,
Is it possible to retrieve the top 'n' most often appearing words within a
search criteria? I've seen the High Frequency Terms code in the sandbox
but it works across the whole index.
To put this question into context: We're developing website that hosts a
user's photo website. Searches can be specific to a particular user's
website or be performed globally across one, many or all websites. I've
accomplished this with a field in the index called website. What I'd like
to do is give each user the top ten words that appear on their website.
Thanks,
Chris Brown
http://www.orangepics.com/
--
-------------------------------------------------------------------
Grant Ingersoll Sr. Software Engineer Center for Natural Language
Processing Syracuse University School of Information Studies 337 Hinds
Hall Syracuse, NY 13244
http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]