Ahh... that is confusing, and seems dubiously useful. I think 99% of the time I'd rather get an event that represents that the session is definitely lost.
On Thu, Aug 20, 2015 at 10:53 AM, Jordan Zimmerman < [email protected]> wrote: > Maybe I'm confused, but I thought that's what ConnectionState SUSPENDED > vs. > LOST was all about? > > It’s a big source of confusion with Curator. LOST does _not_ mean the > session was lost. It means Curator has given up after retries, etc. Because > Curator re-creates ZK handles internally the notion of a “session” is more > complicated than using raw ZooKeeper. > > > -Jordan > > > > > On August 20, 2015 at 9:50:56 AM, Scott Blum ([email protected]) > wrote: > > Maybe I'm confused, but I thought that's what ConnectionState SUSPENDED > vs. > LOST was all about? > > Maybe the recipes just need to be tweaked a bit? > > I always assumed emphemeral nodes would be gone on LOST but not gone if > you > get a SUSPENDED followed by RECONNECTED. > > The one question I've always wondered is what happens to Watchers on > SUSPENDED, do they all need to be re-applied, or will they still fire > later > as long as you don't get LOST? > > On Thu, Aug 20, 2015 at 10:41 AM, Jordan Zimmerman < > [email protected]> wrote: > > > I wonder if we can add error handling policies to Curator. Currently, > the > > policy of all recipes is hard-coded to treat SUSPENDED as a type of lost > > session. We could change this to be injected like the retry policy. To > > solve this particular issue we’d also need to introduce a SESSION_LOST > > state of some type. This is complicated as Curator re-creates > connections > > internally. > > > > Thoughts? > > > > -Jordan > > > > > > > > On August 20, 2015 at 2:10:52 AM, Dong Lei ([email protected]) > wrote: > > > > Hi curator-devs: > > > > We use Spark in standalone mode in which Spark leverage curator to > manage > > ZK connections and elect leader. Our Zookeeper may be not very stable > and > > we get "session suspended and reconnected" sometimes. The problem is > that > > this kind of disassociated and reconnected triggers leader election > quite > > often. And Spark's reaction to leadership switching can be very costly. > > > > So I'm thinking about whether it's possible to tolerate such failure > cases > > if we can reconnect soon and the session is actually kept after the > > reconnection? > > Or does such a requirement makes sense to you? > > > > Any advice will be appreciated. > > > > > > Thanks > > Dong Lei > > > > > >
