[ 
https://issues.apache.org/jira/browse/LUCENE-4622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13542947#comment-13542947
 ] 

Michael McCandless commented on LUCENE-4622:
--------------------------------------------

I think the limitation here is I cannot specify 2 sort fields (like I can in 
Lucene), right?  Ie, I would sort by count (descending) then by label 
(ascending), and then the top K selection and sorting of the final top K facets 
would be "correct".  And a second limitation is that "sort by label" can be 
costly in general because you'd have to resolve each ord -> label whenever the 
primary sort was equal.

It's true Lucene tie breaks by docID, but then the app can specify multiple 
sort fields so that a dup result really in fact looks like a dup to the user as 
well and then the tie-break doesn't matter much.

Anyway, given that the app can just sort after-the-fact, and given the cost of 
sorting-by-label, I think we shouldn't fix this for now ... we can revisit if 
the issue ever arrises in a real app.
                
> TopKFacetsResultHandler should tie break sort by label not ord?
> ---------------------------------------------------------------
>
>                 Key: LUCENE-4622
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4622
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Michael McCandless
>
> EG I now get these facets:
> {noformat}
> Author (5)
>  Lisa (2)
>  Frank (1)
>  Susan (1)
>  Bob (1)
> {noformat}
> The primary sort is by count, but secondary is by ord (= order in which they 
> were indexed), which is not really understandable/transparent to the end 
> user.  I think it'd be best if we could do tie-break sort by label ...
> But talking to Shai, this seems hard/costly to fix, because when visiting the 
> facet ords to collect the top K, we don't currently resolve to label, and in 
> the worst case (say my example had a million labels with count 1) that's a 
> lot of extra label lookups ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to