[
https://issues.apache.org/jira/browse/MESOS-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699967#comment-14699967
]
Greg Mann commented on MESOS-3264:
----------------------------------
Thanks for having a look at this, [[email protected]]! I had explored the
option of using similar shutdown hooks previously, and unfortunately it doesn't
do the trick, I assume because the order of the shutdown hooks is unspecified?
And since they are run concurrently, perhaps the JVM will continue on to its
post-shutdownHook GC while the hooks are still executing. In any case, the
tests continue to fail with such shutdown hooks placed in the constructors of
the SchedulerDriver and/or the ExecutorDriver.
If we define the {{close()}} method as {{public}} and call it explicitly in the
body of {{main()}}, the tests do pass reliably. However, there seems to be some
conventional wisdom saying that defining/calling a method that calls
{{finalize()}} in that way is A Bad Thing. Any thoughts? If we decide that it
is acceptable to define a public {{close()}} method that calls {{finalize()}}
for the SchedulerDriver, similar to the one in your patch, and call it
explicitly just before we call {{System.exit()}}, then that would solve this
issue.
> JVM can exit prematurely following framework teardown
> -----------------------------------------------------
>
> Key: MESOS-3264
> URL: https://issues.apache.org/jira/browse/MESOS-3264
> Project: Mesos
> Issue Type: Bug
> Components: java api
> Affects Versions: 0.23.0, 0.24.0
> Reporter: Greg Mann
> Priority: Minor
> Labels: java, tech-debt
>
> In Java frameworks, it is possible for the JVM to begin exiting the program -
> via {{System.exit()}}, for example - while teardown of native objects such as
> the SchedulerDriver and associated Executors is still in progress.
> {{SchedulerDriver::stop()}} will return after it has sent messages to other
> actors to begin their teardown, meanwhile the JVM is free to terminate the
> program and thus begin executing native object destructors while those
> objects are still in use, potentially leading to a segfault.
> This has manifested itself in flaky tests from the ExamplesTest suite (see
> MESOS-830 and MESOS-1013), as mutexes from glog are destroyed while the
> framework is still shutting down and attempting to log.
> Ideally, a mechanism would exist to block the Java code until a confirmation
> that framework teardown is complete has been received.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)