[
https://issues.apache.org/jira/browse/AURORA-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695767#comment-14695767
]
Maxim Khutornenko commented on AURORA-1193:
-------------------------------------------
Compiled all existing Mesos task status update reasons and messages into the
table:
|| Mesos task reason || Mesos message || Aurora message || Aurora task status
|| Comments ||
| REASON_COMMAND_EXECUTOR_FAILED | “Abnormal executor termination” | same |
FAILED | |
| REASON_EXECUTOR_PREEMPTED | none | none | LOST | |
| REASON_EXECUTOR_TERMINATED | “Executor terminating/terminated” | same | LOST
| |
| REASON_EXECUTOR_UNREGISTERED | “Unregistered executor” | same | KILLED |
_Very confusing_ |
| REASON_FRAMEWORK_REMOVED | "Framework <id> removed" | same | KILLED| |
| REASON_GC_ERROR | "Could not launch the task because we failed to unschedule
directories scheduled for gc" | same | LOST | _Potentially confusing_ |
| REASON_INVALID_FRAMEWORKID | unused | | | |
| REASON_INVALID_OFFERS | "Task launched with invalid offers: <details>" | same
| LOST | |
| REASON_MASTER_DISCONNECTED | "Master disconnected" | same | LOST | |
| REASON_MEMORY_LIMIT | none | "Task used more memory than requested" | FAILED
| |
| REASON_RECONCILIATION | "Reconciliation: <Latest task state \|Task is
unknown to the slave \| Task is unknown>" | same | LOST | |
| REASON_RESOURCES_UNKNOWN | "The checkpointed resources being used by the task
are unknown to the slave" | same | LOST | _Potentially confusing_ |
| REASON_SLAVE_DISCONNECTED | "Slave <hostname> disconnected" | same | LOST | |
| REASON_SLAVE_REMOVED | "Slave <hostname> removed: <reason>" | same | LOST | |
| REASON_SLAVE_RESTARTED | "Task launched during slave restart" | same | LOST |
|
| REASON_SLAVE_UNKNOWN | unused | | | |
| REASON_TASK_INVALID | <reason> | same | LOST | |
| REASON_TASK_UNAUTHORIZED | "Authorization failure: <failure> Not authorized
to launch as user <user>" | same | LOST | |
| REASON_TASK_UNKNOWN | "Task is unknown to the slave" | same | LOST | |
The majority of status update messages are actually quite meaningful and some
contain very helpful debugging info that would be impossible to substitute on
the Aurora side. I am under opinion now that we should not forcefully
alter/suppress all messages. Instead, we should only address a few that a) have
high frequency of appearing and b) are potentially confusing. Out of those
commented above, only the "Unregistered executor" clears both criteria. I
propose to only suppress that one in the scope of this ticket.
> Improve UI task status reporting experience
> -------------------------------------------
>
> Key: AURORA-1193
> URL: https://issues.apache.org/jira/browse/AURORA-1193
> Project: Aurora
> Issue Type: Story
> Reporter: Maxim Khutornenko
> Priority: Minor
>
> Mesos may append an optional message with task status update that is
> currently surfacing in the UI via TaskEvent. These messages may not be user
> friendly and add confusion. One example is "Unregistered executor" issued
> when Mesos kills an assigned task that did not have a chance to run yet.
> While this message does not constitute a failure it may create an illusion of
> abnormal behavior in an otherwise normal operation.
> Consider filtering/formatting messages in the UI/scheduler to avoid adverse
> user experience. The ideal solution should also leverage TaskStatus.reason
> field to show additional status details.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)