Github user ivmaykov commented on the issue:
https://github.com/apache/zookeeper/pull/184
I think as long as we keep `portUnification=false` there should not be
hangs/crashes. However, that means it's not possible to safely upgrade a
cluster from plaintext quorum to TLS quorum without downtime. @hanm mentioned
"mitigations" but there really isn't a way to mitigate the issues w/
`UnifiedServerSocket` in #184 (other than "don't use it"). One option is to
keep the unified server socket code, but don't parse the `portUnification`
option in `QuorumPeerConfig` so there is no way to use the feature. Or we could
document the issues and have a clear warning ("portUnification has known
problems and may cause your ensemble to enter a bad state that requires
reboots, use at your own risk"), and let people take the risk if they like.
One issue that we found is the >10% perf regression in plaintext mode when
the apache httpclient library dependency is added. We never figured out why it
caused the perf regression, but it could be a potential blocker. Unlike the
port unification bugs, this perf regression cannot be worked around - it was
present even when all the SSL features were disabled. The fix for that is small
and isolated to one file, so it would be pretty easy to backport to #184 if
desired.
Let me just do another pass over the differences between the two PRs and
see if anything else jumps out at me.
---