[
https://issues.apache.org/jira/browse/LUCENE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013263#comment-16013263
]
Robert Muir commented on LUCENE-7730:
-------------------------------------
+1
This solves a hairy problem in a non-intrusive way and is a much better
tradeoff to users. I ran some basic relevance tests and it all checks out
including 6x back compat. I see the typical 1% difference in this corpus that i
would see vs using e.g. a 32 bit integer. But for e.g. very small docs users
will be much happier and less likely to compalin about the quantization to a
single byte.
I think it is fine to move TFIDFSimilarity/ClassicSimilarity to misc/. Another
option is to fold them into one class and clean up the abstractions, fix them
to use this encoding too. TFIDFSimilarity was really just a migration thing
(its the pre-4.x Similarity api basically). It is kinda like a rotting
abstraction/tech debt since it has fallen behind. But I think these days for a
custom TF/IDF-like scoring, you'd just use Similarity or SimilarityBase so that
you have all the index statistics and so on? Worth a thought.
When can the old tables and backwards compatibility logic be removed from e.g.
BM25Similarity? I think that part is important.
> Better encode length normalization in similarities
> --------------------------------------------------
>
> Key: LUCENE-7730
> URL: https://issues.apache.org/jira/browse/LUCENE-7730
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Adrien Grand
> Attachments: LUCENE-7730.patch, LUCENE-7730.patch, LUCENE-7730.patch
>
>
> Now that index-time boosts are gone (LUCENE-6819) and that indices record the
> version that was used to create them (for backward compatibility,
> LUCENE-7703), we can look into storing the length normalization factor more
> efficiently.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]