Re: Similarity.lengthNorm and positionIncrement=0

Andrzej Bialecki Sun, 12 Oct 2008 22:09:25 -0700

Michael McCandless wrote:

I agree we should make this possible. A field should not be "penalized"just because many of its terms had synonyms.
In your proposed method addition to Similarity, below,numOverlappingTokens would count the number of tokens that hadpositionIncrement==0? And then that default impl is fully backwardscompatible since it falls back to the current approach of counting theoverlapping tokens when computing lengthNorm?


Yes, and yes.

Maybe in 3.0 we should then switch it to not count overlapping tokens bydefault.

I'm not sure. There are good arguments for and against it, that's why Isuggested adding it as an option.

If a typical usecase is to submit queries with multiple synonyms, thenthe current method works better, because it prevents excessive scoreboosting from multiple matching synonyms. OTOH, if a typical usecase isthat users submit queries consisting of a single synonym, then theproposed method works better.


I'll create a JIRA issue and prepare a patch.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Similarity.lengthNorm and positionIncrement=0

Reply via email to