[ 
https://issues.apache.org/jira/browse/AURORA-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695767#comment-14695767
 ] 

Maxim Khutornenko commented on AURORA-1193:
-------------------------------------------

Compiled all existing Mesos task status update reasons and messages into the 
table:

|| Mesos task reason || Mesos message || Aurora message || Aurora task status 
|| Comments ||
| REASON_COMMAND_EXECUTOR_FAILED | “Abnormal executor termination” | same | 
FAILED | |
| REASON_EXECUTOR_PREEMPTED | none | none | LOST | |
| REASON_EXECUTOR_TERMINATED | “Executor terminating/terminated” | same | LOST 
| |
| REASON_EXECUTOR_UNREGISTERED | “Unregistered executor” | same | KILLED | 
_Very confusing_ |
| REASON_FRAMEWORK_REMOVED | "Framework <id> removed" | same | KILLED| |
| REASON_GC_ERROR | "Could not launch the task because we failed to unschedule 
directories scheduled for gc" | same | LOST | _Potentially confusing_ |
| REASON_INVALID_FRAMEWORKID | unused | | | |
| REASON_INVALID_OFFERS | "Task launched with invalid offers: <details>" | same 
| LOST | |
| REASON_MASTER_DISCONNECTED | "Master disconnected" | same | LOST | |
| REASON_MEMORY_LIMIT | none | "Task used more memory than requested" | FAILED 
| |
| REASON_RECONCILIATION |  "Reconciliation: <Latest task state \|Task is 
unknown to the slave \| Task is unknown>" | same | LOST | |
| REASON_RESOURCES_UNKNOWN | "The checkpointed resources being used by the task 
are unknown to the slave" | same | LOST | _Potentially confusing_ |
| REASON_SLAVE_DISCONNECTED | "Slave <hostname> disconnected" | same | LOST | |
| REASON_SLAVE_REMOVED | "Slave <hostname> removed: <reason>" | same | LOST | |
| REASON_SLAVE_RESTARTED | "Task launched during slave restart" | same | LOST | 
|
| REASON_SLAVE_UNKNOWN | unused | | | |
| REASON_TASK_INVALID | <reason> | same | LOST | |
| REASON_TASK_UNAUTHORIZED | "Authorization failure: <failure> Not authorized 
to launch as user <user>" | same | LOST | |
| REASON_TASK_UNKNOWN | "Task is unknown to the slave" | same | LOST | |

The majority of status update messages are actually quite meaningful and some 
contain very helpful debugging info that would be impossible to substitute on 
the Aurora side. I am under opinion now that we should not forcefully 
alter/suppress all messages. Instead, we should only address a few that a) have 
high frequency of appearing and b) are potentially confusing. Out of those 
commented above, only the "Unregistered executor" clears both criteria. I 
propose to only suppress that one in the scope of this ticket.

> Improve UI task status reporting experience
> -------------------------------------------
>
>                 Key: AURORA-1193
>                 URL: https://issues.apache.org/jira/browse/AURORA-1193
>             Project: Aurora
>          Issue Type: Story
>            Reporter: Maxim Khutornenko
>            Priority: Minor
>
> Mesos may append an optional message with task status update that is 
> currently surfacing in the UI via TaskEvent. These messages may not be user 
> friendly and add confusion. One example is "Unregistered executor" issued 
> when Mesos kills an assigned task that did not have a chance to run yet. 
> While this message does not constitute a failure it may create an illusion of 
> abnormal behavior in an otherwise normal operation.
> Consider filtering/formatting messages in the UI/scheduler to avoid adverse 
> user experience. The ideal solution should also leverage TaskStatus.reason 
> field to show additional status details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to