Sean Zhong created GEARPUMP-8:
---------------------------------
Summary: Two machines can possibly have same worker Id when master
restart in single-master cluster
Key: GEARPUMP-8
URL: https://issues.apache.org/jira/browse/GEARPUMP-8
Project: Apache Gearpump
Issue Type: Bug
Reporter: Sean Zhong
*Why we should NOT allow duplicate worker id?*
We use worker Id to track the resource of single machine. If two machines have
same worker id, then it would create a lot of confusion.
*Pre-condition to trigger this issue?*
This happens when the cluster only has one master, and the master is doing
restart.
If the cluster have multiple masters, then it is not impacted by this issue.
*How this issue happens?*
When master is going through restart, since there is no other master machines
for HA, the master status is lost, including the worker id list that has been
occupied by existing workers. Then when a new worker machine joins, it would
get a fresh worker Id starting from 0, which could possibly conflict with
existing worker machines.
*Suggested fix?*
Instead of using sequence 0, 1, 2, 3, 4... for worker id, we append a
timestamp, which is the time that worker register itself to master.
Like this:
{quote}
WorkerId(0, timestamp1)
WorkerId(1, timestamp2)
...
{quote}
Then when master is restarted, the new worker and old worker can be
differentiated by the timestamp, as the time of registration is different.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)