Re: How to obtain stable leader election over unstable ZK connections

Scott Blum Thu, 20 Aug 2015 08:14:28 -0700

Ahh... that is confusing, and seems dubiously useful.  I think 99% of the
time I'd rather get an event that represents that the session is definitely
lost.


On Thu, Aug 20, 2015 at 10:53 AM, Jordan Zimmerman <
[email protected]> wrote:

> Maybe I'm confused, but I thought that's what ConnectionState SUSPENDED
> vs.
> LOST was all about?
>
> It’s a big source of confusion with Curator. LOST does _not_ mean the
> session was lost. It means Curator has given up after retries, etc. Because
> Curator re-creates ZK handles internally the notion of a “session” is more
> complicated than using raw ZooKeeper.
>
>
> -Jordan
>
>
>
>
> On August 20, 2015 at 9:50:56 AM, Scott Blum ([email protected])
> wrote:
>
> Maybe I'm confused, but I thought that's what ConnectionState SUSPENDED
> vs.
> LOST was all about?
>
> Maybe the recipes just need to be tweaked a bit?
>
> I always assumed emphemeral nodes would be gone on LOST but not gone if
> you
> get a SUSPENDED followed by RECONNECTED.
>
> The one question I've always wondered is what happens to Watchers on
> SUSPENDED, do they all need to be re-applied, or will they still fire
> later
> as long as you don't get LOST?
>
> On Thu, Aug 20, 2015 at 10:41 AM, Jordan Zimmerman <
> [email protected]> wrote:
>
> > I wonder if we can add error handling policies to Curator. Currently,
> the
> > policy of all recipes is hard-coded to treat SUSPENDED as a type of lost
> > session. We could change this to be injected like the retry policy. To
> > solve this particular issue we’d also need to introduce a SESSION_LOST
> > state of some type. This is complicated as Curator re-creates
> connections
> > internally.
> >
> > Thoughts?
> >
> > -Jordan
> >
> >
> >
> > On August 20, 2015 at 2:10:52 AM, Dong Lei ([email protected])
> wrote:
> >
> > Hi curator-devs:
> >
> > We use Spark in standalone mode in which Spark leverage curator to
> manage
> > ZK connections and elect leader. Our Zookeeper may be not very stable
> and
> > we get "session suspended and reconnected" sometimes. The problem is
> that
> > this kind of disassociated and reconnected triggers leader election
> quite
> > often. And Spark's reaction to leadership switching can be very costly.
> >
> > So I'm thinking about whether it's possible to tolerate such failure
> cases
> > if we can reconnect soon and the session is actually kept after the
> > reconnection?
> > Or does such a requirement makes sense to you?
> >
> > Any advice will be appreciated.
> >
> >
> > Thanks
> > Dong Lei
> >
> >
>
>

Re: How to obtain stable leader election over unstable ZK connections

Reply via email to