[
https://issues.apache.org/jira/browse/MESOS-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213484#comment-16213484
]
Alexander Rukletsov commented on MESOS-8096:
--------------------------------------------
There are at least 2 races in the test code around v1 scheduler/executor and
driver libraries. Below only scheduler is described, executor case is the same
modulo it has one more race which was fixed in MESOS-4029 for the scheduler.
h4. The scheduler might not be fully constructed before the driver library
starts to use it.
When we
[initialize|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2159-L2165]
the scheduler driver library, we pass an [{{events}}
callback|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2176],
which
[uses|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2211]
the [member
variable|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2216]
of the library test wrapper. The scheduler library can start using the
callback right after it is constructed, even before the library test wrapper
[fully
initializes|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2180-L2181].
This leads to segfaults when {{SUBSCRIBED}} event is being passed to the
not-yet-initialized scheduler.
h4. The scheduler library might be destroyed while the scheduler still uses it.
This is not fully fixed by MESOS-4029, see {{"AsyncExecutorProcess-badrun-3"}}.
Passing scheduler driver's {{this}} to a scheduler
[here|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2177]
is unsafe and does not guarantee that it is safe to use Mesos.
> Enqueueing events in MockHTTPScheduler can lead to segfaults.
> -------------------------------------------------------------
>
> Key: MESOS-8096
> URL: https://issues.apache.org/jira/browse/MESOS-8096
> Project: Mesos
> Issue Type: Bug
> Components: scheduler driver, test
> Environment: Fedora 23, Ubuntu 14.04, Ubuntu 16
> Reporter: Alexander Rukletsov
> Assignee: Alexander Rukletsov
> Labels: flaky-test, mesosphere
> Attachments: AsyncExecutorProcess-badrun-1.txt,
> AsyncExecutorProcess-badrun-2.txt, AsyncExecutorProcess-badrun-3.txt
>
>
> Various tests segfault due to a yet unknown reason. Comparing logs (attached)
> hints that the problem might be in the scheduler's event queue.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)