Michael McCandless wrote:

I agree we should make this possible. A field should not be "penalized" just because many of its terms had synonyms.

In your proposed method addition to Similarity, below, numOverlappingTokens would count the number of tokens that had positionIncrement==0? And then that default impl is fully backwards compatible since it falls back to the current approach of counting the overlapping tokens when computing lengthNorm?

Yes, and yes.


Maybe in 3.0 we should then switch it to not count overlapping tokens by default.

I'm not sure. There are good arguments for and against it, that's why I suggested adding it as an option.

If a typical usecase is to submit queries with multiple synonyms, then the current method works better, because it prevents excessive score boosting from multiple matching synonyms. OTOH, if a typical usecase is that users submit queries consisting of a single synonym, then the proposed method works better.

I'll create a JIRA issue and prepare a patch.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to