yashmayya opened a new pull request, #13564: URL: https://github.com/apache/pinot/pull/13564
- Aggregation queries without `GROUP BY` clauses typically aren't used with a `LIMIT` clause since it semantically doesn't make sense. - However, it isn't considered invalid, and most databases (including Postgres and MySQL) return the same aggregation results (i.e., over the entire table) regardless of what the `LIMIT` is - except if the `LIMIT` is 0, in which case an empty result set is returned. - The v1 query engine in Pinot follows the former but not the latter - i.e., it returns the same aggregation results (i.e., over the entire table) regardless of what the `LIMIT` is (even if it is 0). - This patch standardizes the Pinot behavior in this case by introducing an `EmptyAggregationOperator` (somewhat similar to the [EmptySelectionOperator](https://github.com/apache/pinot/blob/efa43007adc1dd7736580d882f33956e359b0678/pinot-core/src/main/java/org/apache/pinot/core/operator/query/EmptySelectionOperator.java#L39)): ``` > EXPLAIN PLAN FOR SELECT SUM(ArrDelay) FROM airlineStats LIMIT 0; PLAN_START(numSegmentsForThisPlan:31) BROKER_REDUCE(limit:0) COMBINE_AGGREGATE AGGREGATE_EMPTY ``` ``` > SELECT SUM(ArrDelay) FROM airlineStats LIMIT 0; {"resultTable":{"dataSchema":{"columnNames":["sum(ArrDelay)"],"columnDataTypes":["DOUBLE"]},"rows":[]},"numRowsResultSet":0,"partialResult":false,"exceptions":[],"numGroupsLimitReached":false,"timeUsedMs":3,"requestId":"11345490000000018","brokerId":"Broker_192.168.29.25_8000","numDocsScanned":0,"totalDocs":9746,"numEntriesScannedInFilter":0,"numEntriesScannedPostFilter":0,"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":31,"numSegmentsProcessed":31,"numSegmentsMatched":0,"numConsumingSegmentsQueried":0,"numConsumingSegmentsProcessed":0,"numConsumingSegmentsMatched":0,"minConsumingFreshnessTimeMs":0,"numSegmentsPrunedByBroker":0,"numSegmentsPrunedByServer":0,"numSegmentsPrunedInvalid":0,"numSegmentsPrunedByLimit":0,"numSegmentsPrunedByValue":0,"brokerReduceTimeMs":0,"offlineThreadCpuTimeNs":0,"realtimeThreadCpuTimeNs":0,"offlineSystemActivitiesCpuTimeNs":0,"realtimeSystemActivitiesCpuTimeNs":0,"offlineResponseSerializationCpuTimeNs":0,"realtimeResponseSerializati onCpuTimeNs":0,"offlineTotalCpuTimeNs":0,"realtimeTotalCpuTimeNs":0,"explainPlanNumEmptyFilterSegments":0,"explainPlanNumMatchAllFilterSegments":0,"traceInfo":{}} ``` - This patch fixes one of the cases described in https://github.com/apache/pinot/issues/13563. - Note that this isn't an issue in the v2 multi-stage query engine because the Calcite planner is smart enough to prune sections of query plans that are known to never produce any rows. ``` > SET useMultiStageEngine = true; EXPLAIN PLAN FOR SELECT SUM(ArrDelay) FROM airlineStats LIMIT 0; Execution Plan LogicalValues(tuples=[[]]) ``` - The existing test `testGroupByAggregationWithLimitZero` recently added in https://github.com/apache/pinot/pull/13555 is updated because the integration test actually doesn't do any comparison for aggregation group by queries without an order by clause (also in this case we just want to verify that 0 rows are returned and that the data schema is still returned) - https://github.com/apache/pinot/blob/efa43007adc1dd7736580d882f33956e359b0678/pinot-integration-test-base/src/test/java/org/apache/pinot/integration/tests/ClusterIntegrationTestUtils.java#L812-L817 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
