On Thu, Jun 29, 2006, Marvin Humphrey wrote about "Re: Flexible index format /
Payloads Cont'd":
> * Improve IR precision, by writing a Boolean Scorer that
> takes position into account, a la Brin/Page '98.
Yes, I'd love to see that too (and it doesn't even require any new payloads
support, the positions that Lucene already has are enough).
I tried a small test using the Trec 8 corpus and query-relevance judgements,
and saw a noticable improvement in precision when I added a simplistic
version of this feature: I "or"ed the original query words with
SpanNearQuery's of each pair of words in the query, so the query of
"hot dog bun" will be converted to something similar to:
hot OR dog OR bun OR "hot dog"~7^0.25 "dog bun"~7^0.25 "hot bun"~7^0.25
But this "solution" is obviously not the best we can do: it is inefficient
(goes through each posting list three times), and not tuned. A better solution
would be like you said, to create a modified version of BooleanQuery's
scoring.
--
Nadav Har'El | Friday, Jun 30 2006, 4 Tammuz 5766
IBM Haifa Research Lab |-----------------------------------------
|Give Yogi a rifle. Support your right to
http://nadav.harel.org.il |arm bears!
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]