Michael McCandless wrote:
I agree we should make this possible. A field should not be "penalized"
just because many of its terms had synonyms.
In your proposed method addition to Similarity, below,
numOverlappingTokens would count the number of tokens that had
positionIncrement==0? And then that default impl is fully backwards
compatible since it falls back to the current approach of counting the
overlapping tokens when computing lengthNorm?
Yes, and yes.
Maybe in 3.0 we should then switch it to not count overlapping tokens by
default.
I'm not sure. There are good arguments for and against it, that's why I
suggested adding it as an option.
If a typical usecase is to submit queries with multiple synonyms, then
the current method works better, because it prevents excessive score
boosting from multiple matching synonyms. OTOH, if a typical usecase is
that users submit queries consisting of a single synonym, then the
proposed method works better.
I'll create a JIRA issue and prepare a patch.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]