Sean Zhong created GEARPUMP-8:
---------------------------------

             Summary: Two machines can possibly have same worker Id when master 
restart in single-master cluster
                 Key: GEARPUMP-8
                 URL: https://issues.apache.org/jira/browse/GEARPUMP-8
             Project: Apache Gearpump
          Issue Type: Bug
            Reporter: Sean Zhong



*Why we should NOT allow duplicate worker id?*
We use worker Id to track the resource of single machine. If two machines have 
same worker id, then it would create a lot of confusion.

*Pre-condition to trigger this issue?*
This happens when the cluster only has one master, and the master is doing 
restart. 
If the cluster have multiple masters, then it is not impacted by this issue.

*How this issue happens?*
When master is going through restart, since there is no other master machines 
for HA,  the master status is lost, including the worker id list that has been 
occupied by existing workers. Then when a new worker machine joins, it would 
get a fresh worker Id starting from 0, which could possibly conflict with 
existing worker machines.

*Suggested fix?*
Instead of using sequence 0, 1, 2, 3, 4... for worker id, we append a 
timestamp, which is the time that worker register itself to master.

Like this:
{quote}
WorkerId(0, timestamp1)
WorkerId(1, timestamp2)
...
{quote}

Then when master is restarted, the new worker and old worker can be 
differentiated by the timestamp, as the time of registration is different. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to