[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449021#comment-13449021
 ] 

Camille Fournier commented on ZOOKEEPER-1361:
---------------------------------------------

Created a patch for 3.4 based on this fix. Ross, can you apply it to your 
version and see if it fixes the problem you're seeing? Henry, can you glance at 
it and approve or propose changes?
                
> Leader.lead iterates over 'learners' set without proper synchronisation
> -----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1361
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1361
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.2
>            Reporter: Henry Robinson
>            Assignee: Henry Robinson
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1361-3.4.patch, 
> ZOOKEEPER-1361-no-whitespace.patch, ZOOKEEPER-1361.patch
>
>
> This block:
> {code}
> HashSet<Long> followerSet = new HashSet<Long>();
> for(LearnerHandler f : learners)
>     followerSet.add(f.getSid());
> {code}
> is executed without holding the lock on learners, so if there were ever a 
> condition where a new learner was added during the initial sync phase, I'm 
> pretty sure we'd see a concurrent modification exception. Certainly other 
> parts of the code are very careful to lock on learners when iterating. 
> It would be nice to use a {{ConcurrentHashMap}} to hold the learners instead, 
> but I can't convince myself that this wouldn't introduce some correctness 
> bugs. For example the following:
> Learners contains A, B, C, D
> Thread 1 iterates over learners, and gets as far as B.
> Thread 2 removes A, and adds E.
> Thread 1 continues iterating and sees a learner view of A, B, C, D, E
> This may be a bug if Thread 1 is counting the number of synced followers for 
> a quorum count, since at no point was A, B, C, D, E a correct view of the 
> quorum.
> In practice, I think this is actually ok, because I don't think ZK makes any 
> strong ordering guarantees on learners joining or leaving (so we don't need a 
> strong serialisability guarantee on learners) but I don't think I'll make 
> that change for this patch. Instead I want to clean up the locking protocols 
> on the follower / learner sets - to avoid another easy deadlock like the one 
> we saw in ZOOKEEPER-1294 - and to do less with the lock held; i.e. to copy 
> and then iterate over the copy rather than iterate over a locked set. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to