We use boosts that are calculated based on the frequencies and the
standard alpha, beta, gamma multipliers from Rochio. Non-relevant terms
decrement the frequency. If a term is <= 0, we remove the term (someone
has posted a contribution for dealing with negative weights, we just
haven't adopted it yet). I am sure there are more things you could do,
we just haven't investigated too much. We also give different weights
to things we think are more important based on our NLP analysis.
Ian Soboroff wrote:
Grant Ingersoll <[EMAIL PROTECTED]> writes:
You stole my thunder! :-) Was going to post the URL after doing the
actual talk, but that's all right. I will post a few changes I have
made on the plane tonight or tomorrow to the website below.
Let me know if you have any questions...
I have one. I've been thinking about the problem with doing relevance
feedback in Lucene, and I appreciate seeing your code on getting the
top terms from a single document.
However, the real problem for RF and pseudo-RF techniques is forming
the query. You can obviously add terms to a query, but how are you
handling the weighting? With boosts, or something more sophisticated?
Ian
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
-------------------------------------------------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
337 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]