[
https://issues.apache.org/jira/browse/MESOS-8502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355163#comment-16355163
]
Qian Zhang commented on MESOS-8502:
-----------------------------------
>From the attached log, I can see agent has received the TASK_RUNNING status
>update from executor for the first task:
{code:java}
I0125 15:59:10.749639 13385 slave.cpp:4809] Handling status update TASK_RUNNING
(Status UUID: 3f205fcb-84c2-4448-91ca-fe3c5240c2a4) for task
2d82cb33-3a9f-4dba-a1b4-e27819dec216 of framework
3cef0c75-fa4f-4ec3-a995-d8b63e9b571e-0000
{code}
But such status update was never forwarded to master, I only see theĀ
TASK_RUNNING status update for the second task was forwarded to master:
{code:java}
I0125 15:59:10.757285 13385 slave.cpp:5291] Forwarding the update TASK_RUNNING
(Status UUID: 2c93c767-ab4a-4bb5-932b-b3d266b0a950) for task
6d37c2ba-16ff-4cc8-80b7-9a1e673ce5b2 of framework
3cef0c75-fa4f-4ec3-a995-d8b63e9b571e-0000 to [email protected]:45463
{code}
Not sure why the TASK_RUNNING status update for the first task was lost in
agent.
> The test `DefaultExecutorTest.KillTaskGroupOnTaskFailure` is flaky
> ------------------------------------------------------------------
>
> Key: MESOS-8502
> URL: https://issues.apache.org/jira/browse/MESOS-8502
> Project: Mesos
> Issue Type: Bug
> Components: test
> Environment: CI
> Reporter: Qian Zhang
> Assignee: Qian Zhang
> Priority: Major
> Labels: flaky-test
> Attachments: KillTaskGroupOnTaskFailure-badrun.txt
>
>
> {code:java}
> ../../src/tests/default_executor_tests.cpp:718: Failure
> Actual function call count doesn't match EXPECT_CALL(*scheduler, update(_,
> AllOf( TaskStatusUpdateTaskIdEq(taskInfo1),
> TaskStatusUpdateStateEq(v1::TASK_RUNNING))))...
> Expected: to be called once
> Actual: never called - unsatisfied and active
> ../../src/tests/default_executor_tests.cpp:729: Failure
> Actual function call count doesn't match EXPECT_CALL(*scheduler, update(_,
> AllOf( TaskStatusUpdateTaskIdEq(taskInfo1),
> TaskStatusUpdateStateEq(v1::TASK_FAILED))))...
> Expected: to be called once
> Actual: never called - unsatisfied and active
> {code}
> From the detailed log in the attachment, it seems the root cause is that
> agent did not get a chance to forward TASK_RUNNING to master for the first
> task because it failed immediately.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)