kaijianding commented on pull request #11379:
URL: https://github.com/apache/druid/pull/11379#issuecomment-876371771


   > I don't think the number of rows should be different in those cases. This 
optimization not just modifies the granularity but also dimensions. When 
granularity is set to ALL, the dimensions should have the virtual column for 
the timefloor function. When granularity is set to HOUR, the dimensions should 
not have that virtual column. In these cases, the actual keys used for grouping 
are effectively the same because the ALL granularity groups all rows into one 
bucket. As a result, the number of rows must be the same in both cases.
   
   > select floor(__time to hour),dim1 from a_table group by floor(__time to 
hour),dim1 limit 10
   
   When granularity=ALL, the `cursors` contains only 1 element, and only 10 
results are produced on compute node.
   When granularity=HOUR, the `cursors` contains 24 elements for 1 day, and 10 
results are produced each `cursor` on compute node. This is 24 times cost for 1 
day. If the time interval is 1 year, it is 8,760 times.
   This is the basic idea that make me think it's a bad idea to change the 
granularity from All to HOUR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to