pchang388 commented on issue #12701: URL: https://github.com/apache/druid/issues/12701#issuecomment-1179351359
> @pchang388 - The effort that you have put into investigating this and documenting it, is commendable. I am not an expert on the protocol between supervisor and peon. I am hoping that someone who is, can comment. Though I have fixed one relevant issue in 0.23.0. I will recommend an upgrade and then do further troubleshooting. #12167 > > This will avoid those continuous retries from the supervisor to pause the peon. I would also suggest you take flame graphs on peons when you run into this issue again. Here is an article on how to do that - https://support.imply.io/hc/en-us/articles/360033747953-Profiling-Druid-queries-using-flame-graphs. That way, we will know where is peon spending its time on. Hey @abhishekagarwal87 - appreciate the support! That sounds like a good idea, we are planning to do the upgrade sometime early/middle of next week and are hoping this would go away permanently with the new version. These type of issues can be hard to troubleshoot, since in our case, it is _intermittent_ (few days of running fine and then few days of this problem occurring). I will provide an update on the situation after the upgrade and see where we are with this specific issue. And thank you for the flame graph profiling link, I was not aware this is something we could do but definitely could help us see more into what is happening during the "pause" not actually going through. I will take a look and try it out if things are not resolved with the new version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
