[
https://issues.apache.org/jira/browse/MESOS-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382531#comment-16382531
]
Meng Zhu commented on MESOS-8617:
---------------------------------
This is due to https://issues.apache.org/jira/browse/MESOS-1720. If you look at
the two failures, both involve launch multiple tasks (in different groups). In
1720 we ask master to specify `launch_executor` flag. So the first task will
set the flag to true and the subsequent tasks will set the flag to false (as it
will reuse the same executor). However, in these failures, while the agent got
the two tasks in order, it launches them in a different order. The task did not
set the `launch_executor` flag got processed first, trigger the kill here:
https://github.com/apache/mesos/blob/32f6d4eec2724414e217875f4f7d3b2538db5381/src/slave/slave.cpp#L2888
The symption seems to be in task athorizations:
in `TaskEndpoint`:
I0224 05:58:46.330776 26767 slave.cpp:2326] Authorizing task '1' for framework
ff1a1d90-0d7d-4d56-bb2e-abfb55ce026c-0000
I0224 05:58:46.330806 26767 slave.cpp:8363] Authorizing framework principal
'test-principal' to launch task 1
I0224 05:58:46.330883 26767 slave.cpp:2326] Authorizing task '2' for framework
ff1a1d90-0d7d-4d56-bb2e-abfb55ce026c-0000
I0224 05:58:46.330907 26767 slave.cpp:8363] Authorizing framework principal
'test-principal' to launch task 2
I0224 05:58:46.331068 26770 slave.cpp:2772] Launching task '2' for framework
ff1a1d90-0d7d-4d56-bb2e-abfb55ce026c-0000
In `ROOT_TaskGroupsSharingViaSandboxVolumes`:
I0301 01:52:39.551573 11844 slave.cpp:2326] Authorizing task group containing
tasks [ producer ] for framework 9229614e-2680-4f62-b49c-3bd6f72734c7-0000
I0301 01:52:39.551717 11844 slave.cpp:8363] Authorizing framework principal
'test-principal' to launch task producer
I0301 01:52:39.552479 11844 slave.cpp:2326] Authorizing task group containing
tasks [ consumer ] for framework 9229614e-2680-4f62-b49c-3bd6f72734c7-0000
I0301 01:52:39.552634 11844 slave.cpp:8363] Authorizing framework principal
'test-principal' to launch task consumer
I0301 01:52:39.553275 11843 slave.cpp:2772] Launching task group containing
tasks [ consumer ] for framework 9229614e-2680-4f62-b49c-3bd6f72734c7-0000
In both cases, task authorizing later came out earlier. This should not have
happened.
> Tests using default executor occasionally fail.
> -----------------------------------------------
>
> Key: MESOS-8617
> URL: https://issues.apache.org/jira/browse/MESOS-8617
> Project: Mesos
> Issue Type: Bug
> Reporter: Alexander Rukletsov
> Priority: Major
> Labels: flaky-test
> Attachments: MasterTest.TasksEndpoint-badrun.txt,
> MasterTest.TasksEndpoint-goodrun.txt,
> ROOT_TaskGroupsSharingViaSandboxVolumes-badrun.txt
>
>
> Task transition expectation can be violated resulting in a failing test, e.g.:
> {noformat}
> ../../src/tests/master_tests.cpp:4134: Failure
> Expected: TASK_RUNNING
> To be equal to: status1->state()
> Which is: TASK_LOST
> {noformat}
> List of known affected tests:
> {noformat}
> MasterTest.TasksEndpoint
> LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_TaskGroupsSharingViaSandboxVolumes
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)