I believe the usual solution is to have a separate field on the same document for display purposes (I am assumming you are trying to display the values of the indexed field) that is not stemmed. The tradeoff is in disk space, of course.

Chris Brown wrote:

Okay, I've taken Grant's advice and aggregated the TermFreqVector's for
each term in the applicable field. It works quite well, there's just one
glitch.

Some words like "party" and "picture" appear as "parti" and "pictur". I am
using the SnowballAnalyzer, I suspect that's what's changing the words.
Short of maintaining a second index using a different analyzer, does anyone
have any ideas?

----- Original Message ----- From: "Grant Ingersoll" <[EMAIL PROTECTED]>
To: <java-user@lucene.apache.org>
Sent: Monday, January 09, 2006 12:34 PM
Subject: Re: top n words within a results set?


You could use term vectors to accomplish this. Get your hits for the website, then load the term vector for the field containing the keywords and add up the frequencies

Chris Brown wrote:

Hello,

Is it possible to retrieve the top 'n' most often appearing words within a search criteria? I've seen the High Frequency Terms code in the sandbox but it works across the whole index.

To put this question into context: We're developing website that hosts a user's photo website. Searches can be specific to a particular user's website or be performed globally across one, many or all websites. I've accomplished this with a field in the index called website. What I'd like to do is give each user the top ten words that appear on their website.
Thanks,
Chris Brown

http://www.orangepics.com/



--
------------------------------------------------------------------- Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 337 Hinds Hall Syracuse, NY 13244
http://www.cnlp.org Voice:  315-443-5484 Fax: 315-443-6886

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
------------------------------------------------------------------- Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 337 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to