Derek Dagit created STORM-357:
---------------------------------
Summary: [security] Supervisors can fail to clean up worker files
properly
Key: STORM-357
URL: https://issues.apache.org/jira/browse/STORM-357
Project: Apache Storm (Incubating)
Issue Type: Bug
Reporter: Derek Dagit
Assignee: Derek Dagit
The "worker launcher" script is used to perform a variety of tasks as a
specific user. This requires launching a separate process.
After a worker is shut down, the supervisor uses the "worker launcher" script
to clean up after workers with its "rmr" command. This command could fail for
any number of reasons, just as backtype.storm.util/rmr could fail. But the
"worker launcher" script merely sets the exit code of the process to non-zero,
and that does not result in a thrown exception.
As a result, logic in supervisor.clj clean-up code is bypassed, and it proceeds
to delete the file in workers-users, which is critical for any subsequent
attempts at cleanup without intervention by a privileged user.
The symptom is repeated messages warning that cleanup fails because the
original user is unknown. It rolls log files and can fill the disk with dead
worker directories.
--
This message was sent by Atlassian JIRA
(v6.2#6252)