[ 
https://issues.apache.org/jira/browse/MESOS-8502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355163#comment-16355163
 ] 

Qian Zhang commented on MESOS-8502:
-----------------------------------

>From the attached log, I can see agent has received the TASK_RUNNING status 
>update from executor for the first task:
{code:java}
I0125 15:59:10.749639 13385 slave.cpp:4809] Handling status update TASK_RUNNING 
(Status UUID: 3f205fcb-84c2-4448-91ca-fe3c5240c2a4) for task 
2d82cb33-3a9f-4dba-a1b4-e27819dec216 of framework 
3cef0c75-fa4f-4ec3-a995-d8b63e9b571e-0000
{code}
But such status update was never forwarded to master, I only see theĀ 
TASK_RUNNING status update for the second task was forwarded to master:
{code:java}
I0125 15:59:10.757285 13385 slave.cpp:5291] Forwarding the update TASK_RUNNING 
(Status UUID: 2c93c767-ab4a-4bb5-932b-b3d266b0a950) for task 
6d37c2ba-16ff-4cc8-80b7-9a1e673ce5b2 of framework 
3cef0c75-fa4f-4ec3-a995-d8b63e9b571e-0000 to [email protected]:45463
{code}
Not sure why the TASK_RUNNING status update for the first task was lost in 
agent.

> The test `DefaultExecutorTest.KillTaskGroupOnTaskFailure` is flaky
> ------------------------------------------------------------------
>
>                 Key: MESOS-8502
>                 URL: https://issues.apache.org/jira/browse/MESOS-8502
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>         Environment: CI
>            Reporter: Qian Zhang
>            Assignee: Qian Zhang
>            Priority: Major
>              Labels: flaky-test
>         Attachments: KillTaskGroupOnTaskFailure-badrun.txt
>
>
> {code:java}
> ../../src/tests/default_executor_tests.cpp:718: Failure
> Actual function call count doesn't match EXPECT_CALL(*scheduler, update(_, 
> AllOf( TaskStatusUpdateTaskIdEq(taskInfo1), 
> TaskStatusUpdateStateEq(v1::TASK_RUNNING))))...
> Expected: to be called once
> Actual: never called - unsatisfied and active
> ../../src/tests/default_executor_tests.cpp:729: Failure
> Actual function call count doesn't match EXPECT_CALL(*scheduler, update(_, 
> AllOf( TaskStatusUpdateTaskIdEq(taskInfo1), 
> TaskStatusUpdateStateEq(v1::TASK_FAILED))))...
> Expected: to be called once
> Actual: never called - unsatisfied and active
> {code}
> From the detailed log in the attachment, it seems the root cause is that 
> agent did not get a chance to forward TASK_RUNNING to master for the first 
> task because it failed immediately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to