[
https://issues.apache.org/jira/browse/MESOS-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724994#comment-14724994
]
Klaus Ma commented on MESOS-3351:
---------------------------------
It's wired :); masterInfo ID was generated by "currentDate" + ip + port + pid;
it seems the port was re-used when Stop(master.get())/StartMaster, and the
slave also re-used the port.
> nextSlaveId in master was not updated when recover
> --------------------------------------------------
>
> Key: MESOS-3351
> URL: https://issues.apache.org/jira/browse/MESOS-3351
> Project: Mesos
> Issue Type: Bug
> Components: master
> Environment: Mac OS (Darwin da-macbookair.cn.ibm.com 14.5.0 Darwin
> Kernel Version 14.5.0: Wed Jul 29 02:26:53 PDT 2015;
> root:xnu-2782.40.9~1/RELEASE_X86_64 x86_64)
> Reporter: Klaus Ma
> Assignee: Klaus Ma
> Labels: race-condition, uuid
> Attachments: test.log
>
>
> When a slave register to master, master will generate a slave ID for it by
> masterInfo.id + "-S" + nextSlaveId (in master.cpp) to avoid duplicate
> slaveId. But if master failover, nextSlaveId was reset to 0 which may trigger
> duplicated slaveId between old slave & new slave.
> For now, it's only reproduced in Mac OS unstably, and can NOT reproduce in
> Ubuntu; not sure the other OS.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)