[ 
https://issues.apache.org/jira/browse/LUCENE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269976#comment-16269976
 ] 

Robert Muir commented on LUCENE-8069:
-------------------------------------

Sorry, i don't agree with this issue. In my opinion the whole assumption: 
"Short documents are more likely to get higher scores" is wrong. Its simply not 
the case at all. with longer documents comes higher expected term frequencies 
too. If the similarity is biased towards either shorter or longer documents 
that's a bug, not a desirable quality. So i don't think we should offer this 
option.

we can find literature that indicates sorting by the number of unique terms may 
improve index compression, but thats it, and thats not what we store in the 
norm, not even for omit TF fields (LUCENE-8031).

> Allow index sorting by field length
> -----------------------------------
>
>                 Key: LUCENE-8069
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8069
>             Project: Lucene - Core
>          Issue Type: Wish
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Short documents are more likely to get higher scores, so sorting an index by 
> field length would mean we would be likely to collect best matches first. 
> Depending on the similarity implementation, this might even allow to early 
> terminate collection of top documents on term queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to