Re: highlight - scoring fragments with more of the same token

markharw00d Tue, 26 Sep 2006 08:46:29 -0700

If you were to score repeated terms then I suspect it would have to bedone so that the repetitions didn't score as highly as the firstoccurrence - otherwise f2 could be selected as a better fragment than f3for the query q1 in your example.Repetitions of a term in a fragment could be scored as a very smallfraction of the score given to the first occurrence. This would at leastrank f2 higher than f1 for query q2.Another potentially useful ranking factor may be to boost fragmentsfound at the beginning of a document - that's where people tend to writesummaries or introductions.


Doron Cohen wrote:

This question was raised in the user's list -
http://www.nabble.com/highlighting-tf2322109.html

Assume three fragments and two queries:
  f1 = aa  11  bb  33  cc
  f2 = aa  11  bb  11  cc
  f3 = aa  11  bb  22  cc
  q1 = 11 22
  q2 = 11
Now we call highlighter.getBestFragment(q);
For q1, f3 is returned, as expected.
For q2, f1 is returned, although "11" appears twice in f2 but only once in
f1.

This is because QueryScorer.getTokenScore(Token) counts only unique
fragment tokens.

Would it make sense to make this behavior controllable?
(It is easily done but I am not sure about the consequences.)

Or perhaps there is a way to achieve this behavior (preferring f2 on f1 for
q2 above) that I missed?



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

___________________________________________________________Copy addresses and emails from any email account to Yahoo! Mail - quick, easy and free. http://uk.docs.yahoo.com/trueswitch2.html



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: highlight - scoring fragments with more of the same token

Reply via email to