Tim Armstrong has posted comments on this change. Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations ......................................................................
Patch Set 1: (4 comments) I thought some more about this during the day and I think there are some subtle issues around spilling and passthrough aggs with limits that we don't have test coverage for. http://gerrit.cloudera.org:8080/#/c/3822/1/be/src/exec/partitioned-aggregation-node-ir.cc File be/src/exec/partitioned-aggregation-node-ir.cc: Line 130: if (group_count_ < max_groups_) { > We can turn this into a one-liner without an else. This will also give incorrect results for spilling aggregations. If AGGREGATED_ROWS is true, this is processing a spilled row that was already aggregated, so will double-count that group. I think we should add a test to test_spilling.py that exercises this code path (e.g. a large agg with a high limit). Line 250: if (group_count_ >= max_groups_) { I think this should should actually occur before the remaining_capacity check. Otherwise we'll pass through rows even when we already have enough groups (meaning the limit isn't correctly enforced and we do unnecessary work). http://gerrit.cloudera.org:8080/#/c/3822/1/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: PS1, Line 506: ReachedLimit Do we actually need these limit checks for grouping aggregations now? If not, we should remove the redundant checks and document how the limit is enforced. Line 519: Status PartitionedAggregationNode::GetRowsStreaming(RuntimeState* state, We don't actually implement limit checks in GetRowsStreaming(), since previously limits were never applied. I think in some circumstances (e.g. if it switches into streaming mode) the current version of the patch will return more rows than the limit. I think if you fix my other comment in -ir.cc this may not be the case, but it would be good to think through and document why the explicit limit check isn't necessary (if it indeed isn't). -- To view, visit http://gerrit.cloudera.org:8080/3822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I59c5b7af7a73ccdbc5496b28eacb9b6859d202bc Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
