Yan Xu created MESOS-1265:
-----------------------------
Summary: Group should not process enqueued events from previous
ZooKeeper instance (and ZK session)
Key: MESOS-1265
URL: https://issues.apache.org/jira/browse/MESOS-1265
Project: Mesos
Issue Type: Bug
Reporter: Yan Xu
Fix For: 0.19.0
This issue has caused MESOS-1258 and MESOS-1239 on loaded systems.
Basically, after Group *realizes* or *locally determines* ZK session expiration
it deletes the ZooKeeper client (zk0) and starts a new instance (zk1) of it.
However there can be more events from the previous instance zk0 remaining in
GroupProcess' event queue. These already enqueued events result in incorrect
assumptions because Group thinks these events come from the new Zookeeper
instance (zk1).
A straightforward solution is to have Group check the ZK sessionId associated
with each event and drop events from previous sessions.
Current Group event handlers don't allow ZK Watcher to inform Group about the
sessionId of a particular event.
{noformat:title=Current Group's ZK event handlers}
void connected(bool reconnect);
void reconnecting();
void expired();
void updated(const std::string& path);
void created(const std::string& path);
void deleted(const std::string& path);
{noformat}
But we can add them:
{noformat:title=Proposed Group's ZK event handlers}
void connected(int64_t sessionId, bool reconnect);
void reconnecting(int64_t sessionId);
void expired(int64_t sessionId);
void updated(int64_t sessionId, const std::string& path);
void created(int64_t sessionId, const std::string& path);
void deleted(int64_t sessionId, const std::string& path);
{noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)