npawar opened a new pull request #4602: First pass of GROUP BY with ORDER BY support URL: https://github.com/apache/incubator-pinot/pull/4602 This PR contains the implementation of ORDER BY support in group by. In this first pass, the changes have been done from `CombineGroupByOrderByOperator` upwards. The AggregationGroupByOperator hasn't been changed. `IndexedTable` is used wherever possible (to merge results in CombineGroupByOrderByOperator, and then to reduce results across servers in the BrokerReduceService) `ResultTable` has been introduced, as a standard way to return results to the client. 2 `queryOptions` have been introduced: 1. groupByMode - pql/sql - whether to execute the group by in PQL style (split all aggregations and ignore order by) or standard SQL style 2. responseFormat - pql/sql - whether to present results using List<AggregationResults> (the PQL way), or use ResultTable which is closer to the SQL way. By default, the modes are PQL, PQL In order to get the order by results in ResultTable, modes should be SQL,SQL In order to get the order by results, but in List<AggregationResult>, modes should be SQL,PQL These modes can be added to the JSON payload: `curl -H "Content-Type: application/json" -X POST -d '{"sql":"select count(*) from table group by dim1 order by dim1","queryOptions":"groupByMode=sql;responseFormat=sql"}' http://localhost:8099/query` Pending: Benchmarking. A comparison should be done of `SELECT agg1 FROM table GROUP BY group1, group2 ORDER by agg1 DESC` with the performance of the original `SELECT agg1 FROM table GROUP BY group1, group2` as the results are expected to be identical. We can also compare `SELECT agg1,agg2... FROM table GROUP BY group1, group2 ORDER by agg1 DESC` with the performance of the original `SELECT agg1,agg2... FROM table GROUP BY group1, group2`. The groups will be different in the latter, but it is comparable in terms of result size. Next steps: Push IndexedTable down into the AggregationGroupByOperator. We can introduce new operators, for each strategy we're trying out (1 ConcurrentIndexedTable, multiple SimpleIndexedTable, etc)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org