[
https://issues.apache.org/jira/browse/LUCENE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269976#comment-16269976
]
Robert Muir commented on LUCENE-8069:
-------------------------------------
Sorry, i don't agree with this issue. In my opinion the whole assumption:
"Short documents are more likely to get higher scores" is wrong. Its simply not
the case at all. with longer documents comes higher expected term frequencies
too. If the similarity is biased towards either shorter or longer documents
that's a bug, not a desirable quality. So i don't think we should offer this
option.
we can find literature that indicates sorting by the number of unique terms may
improve index compression, but thats it, and thats not what we store in the
norm, not even for omit TF fields (LUCENE-8031).
> Allow index sorting by field length
> -----------------------------------
>
> Key: LUCENE-8069
> URL: https://issues.apache.org/jira/browse/LUCENE-8069
> Project: Lucene - Core
> Issue Type: Wish
> Reporter: Adrien Grand
> Priority: Minor
>
> Short documents are more likely to get higher scores, so sorting an index by
> field length would mean we would be likely to collect best matches first.
> Depending on the similarity implementation, this might even allow to early
> terminate collection of top documents on term queries.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]