[
https://issues.apache.org/jira/browse/LUCENE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269666#comment-16269666
]
Adrien Grand commented on LUCENE-8069:
--------------------------------------
Agreed with what you said. I'm actually not sure how we should implement it. I
said "length" to remain vague and because assuming that the score will decrease
if the length increases is a fair assumption. But there are multiple ways we
could do it, and I'm not sure which one is better.
I like the simplicity of sorting by the value of the norm field, but like you
say the fact that norms are opaque introduces issues with custom similarities
or even pre-7.0 similarities whose norm is inversely correlated with the length.
Maybe we could work around it by involving the similarity and sorting by the
value of the score for some fake index statistics and freq=1. But I think it is
unsafe if two distinct lengths produce the same score with those fake index
stats but not with other stats. Another idea could be to introduce a new API,
something like {{Similarity.newSortField}} so that the similarity could tell us
how to sort documents by decreasing scores assuming a constant freq.
> Allow index sorting by field length
> -----------------------------------
>
> Key: LUCENE-8069
> URL: https://issues.apache.org/jira/browse/LUCENE-8069
> Project: Lucene - Core
> Issue Type: Wish
> Reporter: Adrien Grand
> Priority: Minor
>
> Short documents are more likely to get higher scores, so sorting an index by
> field length would mean we would be likely to collect best matches first.
> Depending on the similarity implementation, this might even allow to early
> terminate collection of top documents on term queries.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]