Re: [PR] Add logging for downsampling sketches in MSQ (druid)

via GitHub Thu, 13 Jul 2023 07:35:10 -0700


adarshsanjeev commented on PR #14580:
URL: https://github.com/apache/druid/pull/14580#issuecomment-1634358854


   On the worker, this should be at most around 12 * number of time chunks, and 
on the controller at most (worker - 1) times for parallel and (worker-1) * 
number of time chunks for sequential. (more frequent for sequential)
   
   I think this is quite rare(more than the number above might suggest). For 
the first downsample to trigger (one log message), we need around 300MB of rows 
stored (each rows is ~200 bytes generally), and each subsequent one would 
require about half as much.
   
   However, I'm not sure if in the worse cases, it would not spam the logs. 
Thanks for raising this point. I was interested in logging if the job did 
indeed downsample as this could be useful information, (especially when running 
the entire job again with debug logs is not feasible). If number of time chunks 
is too many log messages (it could be in some but not most cases), I'm okay 
with making this debug as well.
   
   > especially since we are logging the whole keyCollector string with it.
   
   The log message with the whole keyCollector is already a debug log, since 
that one could be called multiple times per downsample operation
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add logging for downsampling sketches in MSQ (druid)

Reply via email to