Simon Wang created CURATOR-330:
----------------------------------

             Summary: Need a way to handle connection lost while entering 
double barrier
                 Key: CURATOR-330
                 URL: https://issues.apache.org/jira/browse/CURATOR-330
             Project: Apache Curator
          Issue Type: Improvement
          Components: Recipes
    Affects Versions: 2.10.0
            Reporter: Simon Wang


Here is the problem I’m meeting:

Assuming 3 node ensemble, my application has 3 clients and each one runs on 
same zk node (Client 1, 2 and 3). They use double barrier for coordination. 

Client 1 is entering the barrier and waiting for the other 2. Now the other 2 
nodes are down and then the ensemble gets crashed and the client 1 gets 
LostConnectionException from enter(). That’s expected.

After while the other 2 nodes come back,  all clients need to retry operation 
and reenter the same barrier (It might become more complex if creating a new 
barrier). Here is the problem:

If the session for client 1 is still alive, Client 1 calling enter method will 
get NodeExistException as the ephemeral node corresponding to that session is 
not deleted yet. 

I wonder in this case what should I do from application side? Or I’m thinking 
can we add a mechanism to reenter the barrier but skip creating child node for 
this client if that exists?

I would like to open a Jira for this if required. 

Thanks,
Simon




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to