[
https://issues.apache.org/jira/browse/ZOOKEEPER-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Henry Robinson updated ZOOKEEPER-1361:
--------------------------------------
Attachment: ZOOKEEPER-1361.patch
This patch cleans up some of the synchronisation around Leader's member
variables:
* Member variables are now private to Leader, and are accessed with getters
* Synchronisation (mostly) happens only in those getters, so locks are held for
less long
* As a result, all accesses are now correctly synchronised
Note that the behaviour of the code has changed slightly: where previously some
methods would hold onto e.g. the Leader.learners lock for the entire loop, now
the same method will copy the list of learners and then iterate over the copy
without holding the lock.
> Leader.lead iterates over 'learners' set without proper synchronisation
> -----------------------------------------------------------------------
>
> Key: ZOOKEEPER-1361
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1361
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.4.2
> Reporter: Henry Robinson
> Assignee: Henry Robinson
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1361.patch
>
>
> This block:
> {code}
> HashSet<Long> followerSet = new HashSet<Long>();
> for(LearnerHandler f : learners)
> followerSet.add(f.getSid());
> {code}
> is executed without holding the lock on learners, so if there were ever a
> condition where a new learner was added during the initial sync phase, I'm
> pretty sure we'd see a concurrent modification exception. Certainly other
> parts of the code are very careful to lock on learners when iterating.
> It would be nice to use a {{ConcurrentHashMap}} to hold the learners instead,
> but I can't convince myself that this wouldn't introduce some correctness
> bugs. For example the following:
> Learners contains A, B, C, D
> Thread 1 iterates over learners, and gets as far as B.
> Thread 2 removes A, and adds E.
> Thread 1 continues iterating and sees a learner view of A, B, C, D, E
> This may be a bug if Thread 1 is counting the number of synced followers for
> a quorum count, since at no point was A, B, C, D, E a correct view of the
> quorum.
> In practice, I think this is actually ok, because I don't think ZK makes any
> strong ordering guarantees on learners joining or leaving (so we don't need a
> strong serialisability guarantee on learners) but I don't think I'll make
> that change for this patch. Instead I want to clean up the locking protocols
> on the follower / learner sets - to avoid another easy deadlock like the one
> we saw in ZOOKEEPER-1294 - and to do less with the lock held; i.e. to copy
> and then iterate over the copy rather than iterate over a locked set.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira