[ 
https://issues.apache.org/jira/browse/MESOS-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699967#comment-14699967
 ] 

Greg Mann commented on MESOS-3264:
----------------------------------

Thanks for having a look at this, [[email protected]]! I had explored the 
option of using similar shutdown hooks previously, and unfortunately it doesn't 
do the trick, I assume because the order of the shutdown hooks is unspecified? 
And since they are run concurrently, perhaps the JVM will continue on to its 
post-shutdownHook GC while the hooks are still executing. In any case, the 
tests continue to fail with such shutdown hooks placed in the constructors of 
the SchedulerDriver and/or the ExecutorDriver.

If we define the {{close()}} method as {{public}} and call it explicitly in the 
body of {{main()}}, the tests do pass reliably. However, there seems to be some 
conventional wisdom saying that defining/calling a method that calls 
{{finalize()}} in that way is A Bad Thing. Any thoughts? If we decide that it 
is acceptable to define a public {{close()}} method that calls {{finalize()}} 
for the SchedulerDriver, similar to the one in your patch, and call it 
explicitly just before we call {{System.exit()}}, then that would solve this 
issue.

> JVM can exit prematurely following framework teardown
> -----------------------------------------------------
>
>                 Key: MESOS-3264
>                 URL: https://issues.apache.org/jira/browse/MESOS-3264
>             Project: Mesos
>          Issue Type: Bug
>          Components: java api
>    Affects Versions: 0.23.0, 0.24.0
>            Reporter: Greg Mann
>            Priority: Minor
>              Labels: java, tech-debt
>
> In Java frameworks, it is possible for the JVM to begin exiting the program - 
> via {{System.exit()}}, for example - while teardown of native objects such as 
> the SchedulerDriver and associated Executors is still in progress. 
> {{SchedulerDriver::stop()}} will return after it has sent messages to other 
> actors to begin their teardown, meanwhile the JVM is free to terminate the 
> program and thus begin executing native object destructors while those 
> objects are still in use, potentially leading to a segfault.
> This has manifested itself in flaky tests from the ExamplesTest suite (see 
> MESOS-830 and MESOS-1013), as mutexes from glog are destroyed while the 
> framework is still shutting down and attempting to log.
> Ideally, a mechanism would exist to block the Java code until a confirmation 
> that framework teardown is complete has been received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to