[jira] [Commented] (SOLR-11831) Skip second grouping step if group.limit is 1 (aka Las Vegas patch)

Christine Poerschke (JIRA) Thu, 14 Mar 2019 13:39:16 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793052#comment-16793052
 ]


Christine Poerschke commented on SOLR-11831:
--------------------------------------------

Hello [~mjosephidou] and [~diegoceccarelli], thank you for opening this ticket 
and associated pull request!

Just to say that I've started (a little bit) to take a look at the changes (and 
hope to look more next week):
 * The ASF GitHub Bot mirrors pull request comments into the 'Work Log' but not 
'Comments' for clarity.
 * Of course anyone can comment on the pull request or on the JIRA ticket here, 
whatever works.
 * Your pull request currently includes changes to both Lucene and Solr code, 
nothing wrong with that. I've opened LUCENE-8728 separately (for clarity, 
hopefully) to explore w.r.t. modification vs. extension vs. something else of 
the classes in question.

>  Skip second grouping step if group.limit is 1 (aka Las Vegas patch)
> --------------------------------------------------------------------
>
>                 Key: SOLR-11831
>                 URL: https://issues.apache.org/jira/browse/SOLR-11831
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Malvina Josephidou
>            Priority: Minor
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In cases where we do grouping and ask for  {{group.limit=1}} only it is 
> possible to skip the second grouping step. In our test datasets it improved 
> speed by around 40%.
> Essentially, in the first grouping step each shard returns the top K groups 
> based on the highest scoring document in each group. The top K groups from 
> each shard are merged in the federator and in the second step we ask all the 
> shards to return the top documents from each of the top ranking groups.
> If we only want to return the highest scoring document per group we can 
> return the top document id in the first step, merge results in the federator 
> to retain the top K groups and then skip the second grouping step entirely. 
> This is possible provided that:
> a) We do not need to know the total number of matching documents per group
> b) Within group sort and between group sort is the same. 
> c) We are not doing reranking (this is because this is done in the second 
> grouping step. It is also possible to get this to work with reranking but 
> more work and some additional assumptions are required)
>  
> This patch applies the grouping optimisation in cases where a)-c) apply and 
> we are only sorting by relevance. It is also possible to extend this work to 
> handle multiple sorting criteria and also reranking. 
> P.S. Diego and I called this patch "las vegas" because we started to write it 
> on the flight to Las Vegas for Lucene/Solr revolution. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-11831) Skip second grouping step if group.limit is 1 (aka Las Vegas patch)

Reply via email to