somandal opened a new pull request, #8758: URL: https://github.com/apache/pinot/pull/8758
Today aggregate group-by without order-by queries can be inaccurate and non-deterministic. The results are truncated at multiple stages (segment level and server level) and depending on the order in which the rows are processed the aggregate group-by can return very different results limit on the number of results to be returned is smaller than the total number of rows matching the query. Aggregate group-by with order-by on the other hand has to keep track of the top K results based on the ordering criteria due to which the results are more accurate and deterministic. This PR adds a new query rewriter to rewrite aggregate group-by only queries to include order-by based on the group-by predicates. By default the query rewriter is not added to the list of default query rewriters but this can be overridden via the broker side config. Treating aggregate group-by only queries to include order-by can lead to a performance hit as compared to the group-by only queries as the results need to be sorted and some processing is done under a lock to trim the data-structure for queries including order-by. cc @siddharthteotia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
