[ https://issues.apache.org/jira/browse/LUCENE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013263#comment-16013263 ]
Robert Muir commented on LUCENE-7730: ------------------------------------- +1 This solves a hairy problem in a non-intrusive way and is a much better tradeoff to users. I ran some basic relevance tests and it all checks out including 6x back compat. I see the typical 1% difference in this corpus that i would see vs using e.g. a 32 bit integer. But for e.g. very small docs users will be much happier and less likely to compalin about the quantization to a single byte. I think it is fine to move TFIDFSimilarity/ClassicSimilarity to misc/. Another option is to fold them into one class and clean up the abstractions, fix them to use this encoding too. TFIDFSimilarity was really just a migration thing (its the pre-4.x Similarity api basically). It is kinda like a rotting abstraction/tech debt since it has fallen behind. But I think these days for a custom TF/IDF-like scoring, you'd just use Similarity or SimilarityBase so that you have all the index statistics and so on? Worth a thought. When can the old tables and backwards compatibility logic be removed from e.g. BM25Similarity? I think that part is important. > Better encode length normalization in similarities > -------------------------------------------------- > > Key: LUCENE-7730 > URL: https://issues.apache.org/jira/browse/LUCENE-7730 > Project: Lucene - Core > Issue Type: Task > Reporter: Adrien Grand > Attachments: LUCENE-7730.patch, LUCENE-7730.patch, LUCENE-7730.patch > > > Now that index-time boosts are gone (LUCENE-6819) and that indices record the > version that was used to create them (for backward compatibility, > LUCENE-7703), we can look into storing the length normalization factor more > efficiently. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org