Re: Flexible index format / Payloads Cont'd

Nadav Har'El Fri, 30 Jun 2006 06:07:56 -0700

On Thu, Jun 29, 2006, Marvin Humphrey wrote about "Re: Flexible index format / 
Payloads Cont'd":
>   * Improve IR precision, by writing a Boolean Scorer that
>     takes position into account, a la Brin/Page '98.


Yes, I'd love to see that too (and it doesn't even require any new payloads
support, the positions that Lucene already has are enough).

I tried a small test using the Trec 8 corpus and query-relevance judgements,
and saw a noticable improvement in precision when I added a simplistic
version of this feature: I "or"ed the original query words with
SpanNearQuery's of each pair of words in the query, so the query of
"hot dog bun" will be converted to something similar to:

        hot OR dog OR bun OR "hot dog"~7^0.25 "dog bun"~7^0.25 "hot bun"~7^0.25

But this "solution" is obviously not the best we can do: it is inefficient
(goes through each posting list three times), and not tuned. A better solution
would be like you said, to create a modified version of BooleanQuery's
scoring.

-- 
Nadav Har'El                        |       Friday, Jun 30 2006, 4 Tammuz 5766
IBM Haifa Research Lab              |-----------------------------------------
                                    |Give Yogi a rifle. Support your right to
http://nadav.harel.org.il           |arm bears!

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Flexible index format / Payloads Cont'd

Reply via email to