[
https://issues.apache.org/jira/browse/SOLR-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793052#comment-16793052
]
Christine Poerschke commented on SOLR-11831:
--------------------------------------------
Hello [~mjosephidou] and [~diegoceccarelli], thank you for opening this ticket
and associated pull request!
Just to say that I've started (a little bit) to take a look at the changes (and
hope to look more next week):
* The ASF GitHub Bot mirrors pull request comments into the 'Work Log' but not
'Comments' for clarity.
* Of course anyone can comment on the pull request or on the JIRA ticket here,
whatever works.
* Your pull request currently includes changes to both Lucene and Solr code,
nothing wrong with that. I've opened LUCENE-8728 separately (for clarity,
hopefully) to explore w.r.t. modification vs. extension vs. something else of
the classes in question.
> Skip second grouping step if group.limit is 1 (aka Las Vegas patch)
> --------------------------------------------------------------------
>
> Key: SOLR-11831
> URL: https://issues.apache.org/jira/browse/SOLR-11831
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Malvina Josephidou
> Priority: Minor
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> In cases where we do grouping and ask for {{group.limit=1}} only it is
> possible to skip the second grouping step. In our test datasets it improved
> speed by around 40%.
> Essentially, in the first grouping step each shard returns the top K groups
> based on the highest scoring document in each group. The top K groups from
> each shard are merged in the federator and in the second step we ask all the
> shards to return the top documents from each of the top ranking groups.
> If we only want to return the highest scoring document per group we can
> return the top document id in the first step, merge results in the federator
> to retain the top K groups and then skip the second grouping step entirely.
> This is possible provided that:
> a) We do not need to know the total number of matching documents per group
> b) Within group sort and between group sort is the same.
> c) We are not doing reranking (this is because this is done in the second
> grouping step. It is also possible to get this to work with reranking but
> more work and some additional assumptions are required)
>
> This patch applies the grouping optimisation in cases where a)-c) apply and
> we are only sorting by relevance. It is also possible to extend this work to
> handle multiple sorting criteria and also reranking.
> P.S. Diego and I called this patch "las vegas" because we started to write it
> on the flight to Las Vegas for Lucene/Solr revolution.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]