[
https://issues.apache.org/jira/browse/HADOOP-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430747#comment-16430747
]
Jason Lowe commented on HADOOP-15372:
-------------------------------------
Thanks for the report, Miklos
bq. runCommand()}} can be called while/after Shell.getAllShells() returned all
the shells to be cleaned up. This new thread can avoid the clean up, so that
the process held by it can be leaked causing leaked localized files/etc.
Yes, there's a small bug where Shell#runCommand should be synchronizing on
CHILD_SHELLS as it starts the subprocess. That way it either won't be started
before destroyAllShells is called or it will be part of the list as long as
it's been started. However I would argue it's outside the scope of the
getAllShells and destoryAllShells APIs to prevent all future shells from being
launched, as there may be a use case where someone wants to cleanup all current
shells but still launch future ones. Its job is to kill all active ones which
client code outside of Shell cannot do.
In the specific case of localizing, it looks like we need a second destroy pass
after awaiting the shutdown of the executor to catch any shell that was trying
to launch just as we destroyed the active ones.
{quote}I see another issue as well. [...] the try catch block does not cover
all instructions after the process is started, so for example we can exit the
thread and leak the process
{quote}
Yes that appears to be an issue as well.
[~ebadger] do you have some cycles to look into this?
> Race conditions and possible leaks in the Shell class
> -----------------------------------------------------
>
> Key: HADOOP-15372
> URL: https://issues.apache.org/jira/browse/HADOOP-15372
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 2.10.0, 3.2.0
> Reporter: Miklos Szegedi
> Priority: Minor
>
> YARN-5641 introduced some cleanup code in the Shell class. It has a race
> condition. {{Shell.
> runCommand()}} can be called while/after {{Shell.getAllShells()}} returned
> all the shells to be cleaned up. This new thread can avoid the clean up, so
> that the process held by it can be leaked causing leaked localized files/etc.
> I see another issue as well. {{Shell.runCommand()}} has a finally block with
> a {{
> process.destroy();}} to clean up. However, the try catch block does not cover
> all instructions after the process is started, so for example we can exit the
> thread and leak the process, if {{
> timeOutTimer.schedule(timeoutTimerTask, timeOutInterval);}} causes an
> exception.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]