[
https://issues.apache.org/jira/browse/STORM-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033948#comment-14033948
]
ASF GitHub Bot commented on STORM-357:
--------------------------------------
GitHub user d2r opened a pull request:
https://github.com/apache/incubator-storm/pull/144
[STORM-357] Cleans workers-users file only when rmr is successful
We do not check the exit code of the "worker launcher" script, but even if
we did, it returns 255 indiscriminately preventing us from distinguishing
failure modes.
This change adds a specialized function for calling "rmr" that throws an
exception if we find in fact that the worker root dir was not removed after the
script returns.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/d2r/incubator-storm STORM-357
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-storm/pull/144.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #144
----
commit d1ba4fc4acdadd5e5e138395bdc5892dfdb88bff
Author: Derek Dagit <[email protected]>
Date: 2014-06-17T15:56:51Z
Do not clean up user file when rmr is unsuccessful
----
> [security] Supervisors can fail to clean up worker files properly
> -----------------------------------------------------------------
>
> Key: STORM-357
> URL: https://issues.apache.org/jira/browse/STORM-357
> Project: Apache Storm (Incubating)
> Issue Type: Bug
> Reporter: Derek Dagit
> Assignee: Derek Dagit
>
> The "worker launcher" script is used to perform a variety of tasks as a
> specific user. This requires launching a separate process.
> After a worker is shut down, the supervisor uses the "worker launcher" script
> to clean up after workers with its "rmr" command. This command could fail
> for any number of reasons, just as backtype.storm.util/rmr could fail. But
> the "worker launcher" script merely sets the exit code of the process to
> non-zero, and that does not result in a thrown exception.
> As a result, logic in supervisor.clj clean-up code is bypassed, and it
> proceeds to delete the file in workers-users, which is critical for any
> subsequent attempts at cleanup without intervention by a privileged user.
> The symptom is repeated messages warning that cleanup fails because the
> original user is unknown. It rolls log files and can fill the disk with dead
> worker directories.
--
This message was sent by Atlassian JIRA
(v6.2#6252)