[jira] [Commented] (LUCENE-8069) Allow index sorting by field length

Adrien Grand (JIRA) Tue, 28 Nov 2017 15:15:21 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269666#comment-16269666
 ]


Adrien Grand commented on LUCENE-8069:
--------------------------------------

Agreed with what you said. I'm actually not sure how we should implement it. I 
said "length" to remain vague and because assuming that the score will decrease 
if the length increases is a fair assumption. But there are multiple ways we 
could do it, and I'm not sure which one is better.

I like the simplicity of sorting by the value of the norm field, but like you 
say the fact that norms are opaque introduces issues with custom similarities 
or even pre-7.0 similarities whose norm is inversely correlated with the length.

Maybe we could work around it by involving the similarity and sorting by the 
value of the score for some fake index statistics and freq=1. But I think it is 
unsafe if two distinct lengths produce the same score with those fake index 
stats but not with other stats. Another idea could be to introduce a new API, 
something like {{Similarity.newSortField}} so that the similarity could tell us 
how to sort documents by decreasing scores assuming a constant freq.

> Allow index sorting by field length
> -----------------------------------
>
>                 Key: LUCENE-8069
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8069
>             Project: Lucene - Core
>          Issue Type: Wish
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Short documents are more likely to get higher scores, so sorting an index by 
> field length would mean we would be likely to collect best matches first. 
> Depending on the similarity implementation, this might even allow to early 
> terminate collection of top documents on term queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8069) Allow index sorting by field length

Reply via email to