We use boosts that are calculated based on the frequencies and the standard alpha, beta, gamma multipliers from Rochio. Non-relevant terms decrement the frequency. If a term is <= 0, we remove the term (someone has posted a contribution for dealing with negative weights, we just haven't adopted it yet). I am sure there are more things you could do, we just haven't investigated too much. We also give different weights to things we think are more important based on our NLP analysis.

Ian Soboroff wrote:

Grant Ingersoll <[EMAIL PROTECTED]> writes:

You stole my thunder!  :-)  Was going to post the URL after doing the
actual talk, but that's all right.  I will post a few changes I have
made on the plane tonight or tomorrow to the website below.

Let me know if you have any questions...

I have one.  I've been thinking about the problem with doing relevance
feedback in Lucene, and I appreciate seeing your code on getting the
top terms from a single document.

However, the real problem for RF and pseudo-RF techniques is forming
the query.  You can obviously add terms to a query, but how are you
handling the weighting?  With boosts, or something more sophisticated?

Ian


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--
------------------------------------------------------------------- Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 337 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to