If you were to score repeated terms then I suspect it would have to be
done so that the repetitions didn't score as highly as the first
occurrence - otherwise f2 could be selected as a better fragment than f3
for the query q1 in your example.
Repetitions of a term in a fragment could be scored as a very small
fraction of the score given to the first occurrence. This would at least
rank f2 higher than f1 for query q2.
Another potentially useful ranking factor may be to boost fragments
found at the beginning of a document - that's where people tend to write
summaries or introductions.
Doron Cohen wrote:
This question was raised in the user's list -
http://www.nabble.com/highlighting-tf2322109.html
Assume three fragments and two queries:
f1 = aa 11 bb 33 cc
f2 = aa 11 bb 11 cc
f3 = aa 11 bb 22 cc
q1 = 11 22
q2 = 11
Now we call highlighter.getBestFragment(q);
For q1, f3 is returned, as expected.
For q2, f1 is returned, although "11" appears twice in f2 but only once in
f1.
This is because QueryScorer.getTokenScore(Token) counts only unique
fragment tokens.
Would it make sense to make this behavior controllable?
(It is easily done but I am not sure about the consequences.)
Or perhaps there is a way to achieve this behavior (preferring f2 on f1 for
q2 above) that I missed?
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
___________________________________________________________
Copy addresses and emails from any email account to Yahoo! Mail - quick, easy and free. http://uk.docs.yahoo.com/trueswitch2.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]