yashmayya opened a new pull request, #13564:
URL: https://github.com/apache/pinot/pull/13564

   - Aggregation queries without `GROUP BY` clauses typically aren't used with 
a `LIMIT` clause since it semantically doesn't make sense.
   - However, it isn't considered invalid, and most databases (including 
Postgres and MySQL) return the same aggregation results (i.e., over the entire 
table) regardless of what the `LIMIT` is - except if the `LIMIT` is 0, in which 
case an empty result set is returned.
   - The v1 query engine in Pinot follows the former but not the latter - i.e., 
it returns the same aggregation results (i.e., over the entire table) 
regardless of what the `LIMIT` is (even if it is 0).
   - This patch standardizes the Pinot behavior in this case by introducing an 
`EmptyAggregationOperator` (somewhat similar to the 
[EmptySelectionOperator](https://github.com/apache/pinot/blob/efa43007adc1dd7736580d882f33956e359b0678/pinot-core/src/main/java/org/apache/pinot/core/operator/query/EmptySelectionOperator.java#L39)):
   
   ```
   > EXPLAIN PLAN FOR SELECT SUM(ArrDelay) FROM airlineStats LIMIT 0;
   
   PLAN_START(numSegmentsForThisPlan:31)
   BROKER_REDUCE(limit:0)
   COMBINE_AGGREGATE
   AGGREGATE_EMPTY
   ```
   
   ```
   > SELECT SUM(ArrDelay) FROM airlineStats LIMIT 0;
   
   
{"resultTable":{"dataSchema":{"columnNames":["sum(ArrDelay)"],"columnDataTypes":["DOUBLE"]},"rows":[]},"numRowsResultSet":0,"partialResult":false,"exceptions":[],"numGroupsLimitReached":false,"timeUsedMs":3,"requestId":"11345490000000018","brokerId":"Broker_192.168.29.25_8000","numDocsScanned":0,"totalDocs":9746,"numEntriesScannedInFilter":0,"numEntriesScannedPostFilter":0,"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":31,"numSegmentsProcessed":31,"numSegmentsMatched":0,"numConsumingSegmentsQueried":0,"numConsumingSegmentsProcessed":0,"numConsumingSegmentsMatched":0,"minConsumingFreshnessTimeMs":0,"numSegmentsPrunedByBroker":0,"numSegmentsPrunedByServer":0,"numSegmentsPrunedInvalid":0,"numSegmentsPrunedByLimit":0,"numSegmentsPrunedByValue":0,"brokerReduceTimeMs":0,"offlineThreadCpuTimeNs":0,"realtimeThreadCpuTimeNs":0,"offlineSystemActivitiesCpuTimeNs":0,"realtimeSystemActivitiesCpuTimeNs":0,"offlineResponseSerializationCpuTimeNs":0,"realtimeResponseSerializati
 
onCpuTimeNs":0,"offlineTotalCpuTimeNs":0,"realtimeTotalCpuTimeNs":0,"explainPlanNumEmptyFilterSegments":0,"explainPlanNumMatchAllFilterSegments":0,"traceInfo":{}}
   ```
   
   - This patch fixes one of the cases described in 
https://github.com/apache/pinot/issues/13563.
   - Note that this isn't an issue in the v2 multi-stage query engine because 
the Calcite planner is smart enough to prune sections of query plans that are 
known to never produce any rows.
   ```
   > SET useMultiStageEngine = true; EXPLAIN PLAN FOR SELECT SUM(ArrDelay) FROM 
airlineStats LIMIT 0;
   
   Execution Plan
   LogicalValues(tuples=[[]])
   ```
   - The existing test `testGroupByAggregationWithLimitZero` recently added in 
https://github.com/apache/pinot/pull/13555 is updated because the integration 
test actually doesn't do any comparison for aggregation group by queries 
without an order by clause (also in this case we just want to verify that 0 
rows are returned and that the data schema is still returned)   - 
https://github.com/apache/pinot/blob/efa43007adc1dd7736580d882f33956e359b0678/pinot-integration-test-base/src/test/java/org/apache/pinot/integration/tests/ClusterIntegrationTestUtils.java#L812-L817


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to