cpoerschke commented on a change in pull request #300: SOLR-11831: Skip second
grouping step if group.limit is 1 (aka Las Vegas Patch)
URL: https://github.com/apache/lucene-solr/pull/300#discussion_r267518970
##########
File path:
solr/core/src/java/org/apache/solr/search/grouping/distributed/responseprocessor/SearchGroupShardResponseProcessor.java
##########
@@ -143,12 +151,61 @@ public void process(ResponseBuilder rb, ShardRequest
shardRequest) {
if (mergedTopGroups == null) {
continue;
}
-
- rb.mergedSearchGroups.put(groupField, mergedTopGroups);
- for (SearchGroup<BytesRef> mergedTopGroup : mergedTopGroups) {
- rb.searchGroupToShards.get(groupField).put(mergedTopGroup,
tempSearchGroupToShards.get(groupField).get(mergedTopGroup));
+ if (rb.getGroupingSpec().isSkipSecondGroupingStep()){
+ /* If we are skipping the second grouping step we want to translate
the response of the
+ * first step in the response of the second step and send it to the
get_fields step.
+ */
+ processSkipSecondGroupingStep(rb, mergedTopGroups,
tempSearchGroupToShards, docIdToShard, groupSort, fields, groupField);
Review comment:
Two things and one question 'jumped out' at me when looking at this specific
proposed change here:
(1) The new local `docIdToShard` variable is allocated and accumulated on
both if-and-else code paths but only used in the if-case.
(2) The `processSkipSecondGroupingStep` takes seven arguments which
(subjectively) seems a lot.
(3) question: There's no `rb.mergedSearchGroups.put` and
`rb.searchGroupToShards` in the if-case -- why might that be and would that be
okay?
Taking a closer look I learnt the following:
(4) In the `processSkipSecondGroupingStep` method there is an
`rb.searchGroupToShards.put` call.
(5) Two of the `processSkipSecondGroupingStep` arguments are contained in
the `rb` argument, namely `groupSort` and `fields`.
(6) One of the `processSkipSecondGroupingStep` arguments i.e.
`tempSearchGroupToShards` is used only for the `rb.searchGroupToShards.put`
call. If that call could be factored out of the method then that would reduce
the if-vs-else-case differences and save one method argument.
(7) The new local `docIdToShard` variable has conceptual similarity to the
existing `tempSearchGroupToShards` variable i.e both map 'something' to a shard
or shards:
(7a) `docIdToShard` is a one-to-one mapping since a single document lives on
exactly one shard
(7b) `tempSearchGroupToShards` is a one-to-mapping mapping since a single
group can contain multiple documents that (between them) live on multiple shards
Next then I did some exploratory refactoring to see if/how:
(8) unnecessary allocation and accumulation of `docIdToShard` might be
avoided for (existing) code paths that do not or cannot use the
`group.skip.second.step=true` optimisation
(9) the `processSkipSecondGroupingStep` signature might be narrowed to take
fewer than seven arguments
What emerged was a deceptively simple idea:
(10) replace the `Map<String, Map<SearchGroup<BytesRef>, Set<String>>>
tempSearchGroupToShards;` local variable with a `SearchGroupsContainer
searchGroupsContainer;` local variable and give the `SearchGroupsContainer`
inner class methods that do what the code currently does i.e. just code
refactoring
(11) for the `group.skip.second.step=true` optimisation extend the
`SearchGroupsContainer` class and override methods as required
https://github.com/cpoerschke/lucene-solr/tree/github-bloomberg-SOLR-11831-cpoerschke-4
shares what emerged, it compiles but is totally untested:
*
https://github.com/cpoerschke/lucene-solr/commit/6f53499edaa9df5dc1a1ace371872404df75ee84
shows the differences relative to the branch behind the pull request here
*
https://github.com/apache/lucene-solr/compare/5c143022e7abcdf14a570786afec4ff099fd581c...6f53499edaa9df5dc1a1ace371872404df75ee84
shows the ('all in') differences relative to the master branch baseline code
What do you think?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]