Github user mkedwards commented on a diff in the pull request:
https://github.com/apache/zookeeper/pull/707#discussion_r235467449
--- Diff:
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
---
@@ -1556,32 +1573,50 @@ public String getNextDynamicConfigFilename() {
}
public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean
writeToDisk){
- synchronized (QV_LOCK) {
- if (lastSeenQuorumVerifier != null &&
lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
- LOG.error("setLastSeenQuorumVerifier called with stale
config " + qv.getVersion() +
- ". Current version: " +
quorumVerifier.getVersion());
+ // If qcm is non-null, we may call qcm.connectOne(), which will
take the lock on qcm
+ // and then take QV_LOCK. Take the locks in the same order to
ensure that we don't
+ // deadlock against other callers of connectOne(). If qcmRef gets
set in another
+ // thread while we're inside the synchronized block, that does no
harm; if we didn't
+ // take a lock on qcm (because it was null when we sampled it), we
won't call
+ // connectOne() on it. (Use of an AtomicReference is enough to
guarantee visibility
+ // of updates that provably happen in another thread before
entering this method.)
+ QuorumCnxManager qcm = qcmRef.get();
+ Object outerLockObject = (qcm != null) ? qcm : QV_LOCK;
--- End diff --
It does. I'm fairly sure that this guarantees that the `QuorumCnxManager`
lock can only be taken inside `QV_LOCK` if it's already held outside it.
Perhaps we can get confirmation from @castuardo and his team's Distributed
System Model Checking tool?
---