Most HA frameworks will failover when they lose leadership.

I don't understand the semantics of SUSPENDED in curator, but the
conservative thing to do would be to failover when this occurs if it means
that you should not be taking any leader-related actions. This will really
depend on the Curator leader election semantics though.

The semantics of SUSPENDED seem a bit strange to me as it says "There has
been a loss of connection. Leaders, locks, etc. should suspend until the
connection is re-established. If the connection times-out you will receive
a LOST notice". One must assume that they *may* have lost leadership from
this description. Have you reached out to the Curator devs?

You can't stop() and then re-start() the driver (although the names would
lead one to think otherwise).


On Wed, Apr 16, 2014 at 11:23 AM, David Greenberg <[email protected]>wrote:

> Hello Mesos devs,
> I'm trying to integrate leader election into the framework I already wrote.
> The framework uses a shared database, and will be fine as long as at most
> one copy is running at a given time. I am using Curator to provide the
> leader election. What I'm not sure about is how to handle when the curator
> goes into a SUSPENDED state--I'd like to stop the driver, and restart it
> once the connection RECONNECTs, but I'm not sure if I can do that without
> creating a whole new MesosSchedulerDriver. Is this the right way to go
> about it? Should I just ignore SUSPENDED? What do other frameworks do?
>

Reply via email to