[ 
https://issues.apache.org/jira/browse/MESOS-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213484#comment-16213484
 ] 

Alexander Rukletsov commented on MESOS-8096:
--------------------------------------------

There are at least 2 races in the test code around v1 scheduler/executor and 
driver libraries. Below only scheduler is described, executor case is the same 
modulo it has one more race which was fixed in MESOS-4029 for the scheduler.

h4. The scheduler might not be fully constructed before the driver library 
starts to use it.
When we 
[initialize|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2159-L2165]
 the scheduler driver library, we pass an [{{events}} 
callback|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2176],
 which 
[uses|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2211]
 the [member 
variable|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2216]
 of the library test wrapper. The scheduler library can start using the 
callback right after it is constructed, even before the library test wrapper 
[fully 
initializes|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2180-L2181].
 This leads to segfaults when {{SUBSCRIBED}} event is being passed to the 
not-yet-initialized scheduler.

h4. The scheduler library might be destroyed while the scheduler still uses it.
This is not fully fixed by MESOS-4029, see {{"AsyncExecutorProcess-badrun-3"}}. 
Passing scheduler driver's {{this}} to a scheduler 
[here|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2177]
 is unsafe and does not guarantee that it is safe to use Mesos.

> Enqueueing events in MockHTTPScheduler can lead to segfaults.
> -------------------------------------------------------------
>
>                 Key: MESOS-8096
>                 URL: https://issues.apache.org/jira/browse/MESOS-8096
>             Project: Mesos
>          Issue Type: Bug
>          Components: scheduler driver, test
>         Environment: Fedora 23, Ubuntu 14.04, Ubuntu 16
>            Reporter: Alexander Rukletsov
>            Assignee: Alexander Rukletsov
>              Labels: flaky-test, mesosphere
>         Attachments: AsyncExecutorProcess-badrun-1.txt, 
> AsyncExecutorProcess-badrun-2.txt, AsyncExecutorProcess-badrun-3.txt
>
>
> Various tests segfault due to a yet unknown reason. Comparing logs (attached) 
> hints that the problem might be in the scheduler's event queue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to