Yan Xu created MESOS-1265:
-----------------------------

             Summary: Group should not process enqueued events from previous 
ZooKeeper instance (and ZK session)
                 Key: MESOS-1265
                 URL: https://issues.apache.org/jira/browse/MESOS-1265
             Project: Mesos
          Issue Type: Bug
            Reporter: Yan Xu
             Fix For: 0.19.0


This issue has caused MESOS-1258 and MESOS-1239 on loaded systems.

Basically, after Group *realizes* or *locally determines* ZK session expiration 
it deletes the ZooKeeper client (zk0) and starts a new instance (zk1) of it. 
However there can be more events from the previous instance zk0 remaining in 
GroupProcess' event queue. These already enqueued events result in incorrect 
assumptions because Group thinks these events come from the new Zookeeper 
instance (zk1).

A straightforward solution is to have Group check the ZK sessionId associated 
with each event and drop events from previous sessions.

Current Group event handlers don't allow ZK Watcher to inform Group about the 
sessionId of a particular event.
{noformat:title=Current Group's ZK event handlers}
  void connected(bool reconnect);
  void reconnecting();
  void expired();
  void updated(const std::string& path);
  void created(const std::string& path);
  void deleted(const std::string& path);
{noformat}

But we can add them:
{noformat:title=Proposed Group's ZK event handlers}
  void connected(int64_t sessionId, bool reconnect);
  void reconnecting(int64_t sessionId);
  void expired(int64_t sessionId);
  void updated(int64_t sessionId, const std::string& path);
  void created(int64_t sessionId, const std::string& path);
  void deleted(int64_t sessionId, const std::string& path);
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to