adarshsanjeev commented on PR #14580: URL: https://github.com/apache/druid/pull/14580#issuecomment-1634358854
On the worker, this should be at most around 12 * number of time chunks, and on the controller at most (worker - 1) times for parallel and (worker-1) * number of time chunks for sequential. (more frequent for sequential) I think this is quite rare(more than the number above might suggest). For the first downsample to trigger (one log message), we need around 300MB of rows stored (each rows is ~200 bytes generally), and each subsequent one would require about half as much. However, I'm not sure if in the worse cases, it would not spam the logs. Thanks for raising this point. I was interested in logging if the job did indeed downsample as this could be useful information, (especially when running the entire job again with debug logs is not feasible). If number of time chunks is too many log messages (it could be in some but not most cases), I'm okay with making this debug as well. > especially since we are logging the whole keyCollector string with it. The log message with the whole keyCollector is already a debug log, since that one could be called multiple times per downsample operation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
