[
https://issues.apache.org/jira/browse/CURATOR-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Simon Wang updated CURATOR-330:
-------------------------------
Description:
Here is the problem I’m meeting:
Assuming 3 node ensemble, my application has 3 clients and each one runs on
same zk node (Client 1, 2 and 3). They use double barrier for coordination.
Client 1 is entering the barrier and waiting for the other 2. Now the other 2
nodes are down and then the ensemble gets crashed and the client 1 gets
LostConnectionException from enter(). That’s expected.
After while the other 2 nodes come back, all clients need to retry operation
and reenter the same barrier (It might become more complex if creating a new
barrier). Here is the problem:
If the session for client 1 is still alive, Client 1 calling enter method will
get NodeExistException as the ephemeral node corresponding to that session is
not deleted yet.
I wonder in this case what should I do from application side? Or I’m thinking
can we add a mechanism to reenter the barrier but skip creating child node for
this client if that exists?
Thanks,
Simon
was:
Here is the problem I’m meeting:
Assuming 3 node ensemble, my application has 3 clients and each one runs on
same zk node (Client 1, 2 and 3). They use double barrier for coordination.
Client 1 is entering the barrier and waiting for the other 2. Now the other 2
nodes are down and then the ensemble gets crashed and the client 1 gets
LostConnectionException from enter(). That’s expected.
After while the other 2 nodes come back, all clients need to retry operation
and reenter the same barrier (It might become more complex if creating a new
barrier). Here is the problem:
If the session for client 1 is still alive, Client 1 calling enter method will
get NodeExistException as the ephemeral node corresponding to that session is
not deleted yet.
I wonder in this case what should I do from application side? Or I’m thinking
can we add a mechanism to reenter the barrier but skip creating child node for
this client if that exists?
I would like to open a Jira for this if required.
Thanks,
Simon
> Need a way to handle connection lost while entering double barrier
> ------------------------------------------------------------------
>
> Key: CURATOR-330
> URL: https://issues.apache.org/jira/browse/CURATOR-330
> Project: Apache Curator
> Issue Type: Bug
> Components: Recipes
> Affects Versions: 2.10.0
> Reporter: Simon Wang
>
> Here is the problem I’m meeting:
> Assuming 3 node ensemble, my application has 3 clients and each one runs on
> same zk node (Client 1, 2 and 3). They use double barrier for coordination.
> Client 1 is entering the barrier and waiting for the other 2. Now the other 2
> nodes are down and then the ensemble gets crashed and the client 1 gets
> LostConnectionException from enter(). That’s expected.
> After while the other 2 nodes come back, all clients need to retry operation
> and reenter the same barrier (It might become more complex if creating a new
> barrier). Here is the problem:
> If the session for client 1 is still alive, Client 1 calling enter method
> will get NodeExistException as the ephemeral node corresponding to that
> session is not deleted yet.
> I wonder in this case what should I do from application side? Or I’m thinking
> can we add a mechanism to reenter the barrier but skip creating child node
> for this client if that exists?
> Thanks,
> Simon
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)