kaijianding commented on pull request #11379: URL: https://github.com/apache/druid/pull/11379#issuecomment-876371771
> I don't think the number of rows should be different in those cases. This optimization not just modifies the granularity but also dimensions. When granularity is set to ALL, the dimensions should have the virtual column for the timefloor function. When granularity is set to HOUR, the dimensions should not have that virtual column. In these cases, the actual keys used for grouping are effectively the same because the ALL granularity groups all rows into one bucket. As a result, the number of rows must be the same in both cases. > select floor(__time to hour),dim1 from a_table group by floor(__time to hour),dim1 limit 10 When granularity=ALL, the `cursors` contains only 1 element, and only 10 results are produced on compute node. When granularity=HOUR, the `cursors` contains 24 elements for 1 day, and 10 results are produced each `cursor` on compute node. This is 24 times cost for 1 day. If the time interval is 1 year, it is 8,760 times. This is the basic idea that make me think it's a bad idea to change the granularity from All to HOUR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
