[GitHub] [druid] jobar opened a new issue #11264: SQL query uses TopN when grouping by time and other dimension

GitBox Mon, 17 May 2021 09:18:04 -0700


jobar opened a new issue #11264:
URL: https://github.com/apache/druid/issues/11264



   ### Description
   
   When querying the SQL druid endpoint with a query doing a group-by on two 
fields and on of them is time related, the TopN query-type could be used 
instead of the group-by one, with the time grouping being implemented as 
"granularity".
   
   Example:
   ```
   EXPLAIN PLAN FOR
   SELECT FLOOR("__time" TO MONTH) AS "__timestamp",
          "my_field" AS "my_field",
          SUM(my_value) AS "sum_my_value"
   FROM my_data_source
   WHERE "__time" >= '...'
     AND "__time" < '...'
   GROUP BY "my_field", "FLOOR("__time" TO MONTH)
   ORDER BY sum_my_value
   LIMIT 500;
   
   
DruidQueryRel(query=[{"queryType":"groupBy","dataSource":{"type":"table","name":"my_data_source"}...
   ```
   
   ### Motivation
   
   Group-by queries are a lot more expensive than TopN queries, this change 
would allow to get results a lot faster and cheaper.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] jobar opened a new issue #11264: SQL query uses TopN when grouping by time and other dimension

Reply via email to