imply-cheddar commented on PR #11760: URL: https://github.com/apache/druid/pull/11760#issuecomment-1162426852
ZK segment loading is broken right now. As of ~2 years ago, a PR was merged that breaks the order of segment loading and dropping via ZK, such that the assignment can enter into deadlocks when a cluster is mostly full. This wasn't widely an issue (personally, I only learned about it ~6 months ago) because the largest clusters (at least that I'm aware of) have all been using http segment assignment. https://github.com/apache/druid/pull/11717 has been merged. While it is and was a bug, it was a corner case that we've only seen in development environments and never actually saw it in a production environment. Every cluster I touch, I move from ZK assignment to HTTP assignment because my experience is that HTTP assignment is more stable. I'm +1 on this directionally, but the PR does need the tests fixed as Kashif suggested before it can be approved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
