gianm commented on PR #16425: URL: https://github.com/apache/druid/pull/16425#issuecomment-2104905648
Just reviewed the lists. By downgrading from 5.5 to 5.3 we do lose various fixes. These ones sound like they could be important: - [[CURATOR-504](https://issues.apache.org/jira/browse/CURATOR-504)] - Race conditions in LeaderLatch after reconnecting to ensemble - [[CURATOR-638](https://issues.apache.org/jira/browse/CURATOR-638)] - Curator disconnect from zookeeper when IPs change [seems especially relevant to k8s environments] - [[CURATOR-644](https://issues.apache.org/jira/browse/CURATOR-644)] - CLONE - Race conditions in LeaderLatch after reconnecting to ensemble [a live-lock issue; we believe the fix for this bug introduced the split-brain problem; so rolling back would reintroduce the live-lock issue] - [[CURATOR-649](https://issues.apache.org/jira/browse/CURATOR-649)] - Background exception was not retry-able or retry gave up [robustness] With regard to downgrading Curator to 5.3 in Druid 30, I think we should be especially careful of these issues, and in particular CURATOR-638 and possible impact on k8s environments. Alternate approaches do include: - Stay with Curator 5.5 and work around this bug, such as by closing and recreating our LeaderLatch when ZK session changes. - Stay with Curator 5.5 and _don't_ work around this bug; wait for release of Curator 5.7 which will hopefully include a fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
