Erik Weathers created MESOS-4737:
------------------------------------
Summary: document TaskID uniqueness requirement
Key: MESOS-4737
URL: https://issues.apache.org/jira/browse/MESOS-4737
Project: Mesos
Issue Type: Task
Components: documentation
Affects Versions: 0.27.0
Reporter: Erik Weathers
Assignee: Erik Weathers
Priority: Minor
There are comments above the definition of TaskID in
[mesos.proto|https://github.com/apache/mesos/blob/0.27.0/include/mesos/mesos.proto#L63-L66]
which lead one to believe it is ok to reuse TaskID values so long as you
guarantee there will only ever be 1 such TaskID running at the same time.
{code title=existing comments for TaskID}
* A framework generated ID to distinguish a task. The ID must remain
* unique while the task is active. However, a framework can reuse an
* ID _only_ if a previous task with the same ID has reached a
* terminal state (e.g., TASK_FINISHED, TASK_LOST, TASK_KILLED, etc.).
{code}
However, there are a few scenarios where problems can arise.
# The checkpointing-and-recovery feature of mesos-slave/agent clashes with
tasks that reuse an ID and get assigned to the same executor.
#* See [this
email|https://mail-archives.apache.org/mod_mbox/mesos-user/201602.mbox/%3CCAO5KYW8%2BXMWc1dXtEo20BAsfGow028jwjL2ubMinP%2BK%2BvdOh8w%40mail.gmail.com%3E]
for more info, as well as the attachment on this issue.
# Issues during network partitions and master failover, where a TaskID might
appear to be unique in the system, whereas in actuality another Task is running
with that ID and was just partitioned away for some time.
In light of these issues, we should simply update the document(s) to make it
abundantly clear that reusing TaskIDs is never ok. At the minimum this should
involve updating the afore-mentioned comments in {{mesos.proto}}. Also any
framework development guides that talk about TaskID creation should be updated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)