npawar opened a new pull request #4602: First pass of GROUP BY with ORDER BY 
support
URL: https://github.com/apache/incubator-pinot/pull/4602
 
 
   This PR contains the implementation of ORDER BY support in group by.
   
   In this first pass, the changes have been done from 
`CombineGroupByOrderByOperator` upwards. The AggregationGroupByOperator hasn't 
been changed.
   
   `IndexedTable` is used wherever possible (to merge results in 
CombineGroupByOrderByOperator, and then to reduce results across servers in the 
BrokerReduceService)
   
   `ResultTable` has been introduced, as a standard way to return results to 
the client.
   
   2 `queryOptions` have been introduced:
   1. groupByMode - pql/sql - whether to execute the group by in PQL style 
(split all aggregations and ignore order by) or standard SQL style
   2. responseFormat - pql/sql - whether to present results using 
List<AggregationResults> (the PQL way), or use ResultTable which is closer to 
the SQL way.
   By default, the modes are PQL, PQL
   In order to get the order by results in ResultTable, modes should be SQL,SQL
   In order to get the order by results, but in List<AggregationResult>, modes 
should be SQL,PQL
   These modes can be added to the JSON payload:
   `curl -H "Content-Type: application/json" -X POST -d '{"sql":"select 
count(*) from table group by dim1 order by 
dim1","queryOptions":"groupByMode=sql;responseFormat=sql"}' 
http://localhost:8099/query`
   
   Pending: Benchmarking.
   A comparison should be done of `SELECT agg1 FROM table GROUP BY group1, 
group2 ORDER by agg1 DESC` with the performance of the original `SELECT agg1 
FROM table GROUP BY group1, group2` as the results are expected to be identical.
   We can also compare `SELECT agg1,agg2... FROM table GROUP BY group1, group2 
ORDER by agg1 DESC` with the performance of the original `SELECT agg1,agg2... 
FROM table GROUP BY group1, group2`. The groups will be different in the 
latter, but it is comparable in terms of result size.
   
   Next steps: Push IndexedTable down into the AggregationGroupByOperator. We 
can introduce new operators, for each strategy we're trying out (1 
ConcurrentIndexedTable, multiple SimpleIndexedTable, etc)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to