Jay Buffington created AURORA-367:
-------------------------------------
Summary: simple commands using sudo do not go through FINALIZING
state
Key: AURORA-367
URL: https://issues.apache.org/jira/browse/AURORA-367
Project: Aurora
Issue Type: Bug
Components: Executor
Reporter: Jay Buffington
Priority: Minor
The below .aurora file causes the FINALIZING state to be skipped. It goes
through these states:
{noformat}
ACTIVE 04/28 18:33:29
ACTIVE 04/28 18:35:51
CLEANING 04/28 18:35:51
KILLED 04/28 18:36:51
{noformat}
Here's the definition of the job:
{noformat}
$ cat fail_finalize.aurora
jobs = [Job(
task=SimpleTask(name="fail_finalize", command="""
sudo sleep 600
"""),
role='jaybuff',
environment="prod",
cluster="vp21d01cp")]
{noformat}
{noformat}
$ aurora inspect vp21d01cp/jaybuff/prod/fail_finalize fail_finalize.aurora
Job level information
name: fail_finalize
role: jaybuff
contact: <class 'pystachio.composite.Empty'>
cluster: vp21d01cp
instances: 1
service: False
production: False
Task level information
name: fail_finalize
Process fail_finalize:
cmdline:
sudo sleep 600
$
{noformat}
It looks like this is caused due to bash exec'ing "simple commands" rather than
"fork+exec" like it does with a "complex command" like {{sudo sleep 600; 1}}
To demonstrate what I'm talking about:
{{/bin/bash -c "(sudo sleep 400)"}} forces the command to run in a subshell
(effectively doing a fork+exec), so we see this:
{noformat}
$ ps afo pid,user,cmd
PID USER CMD
5320 jaybuff -bash
18651 jaybuff \_ /bin/bash -c (sudo sleep 400)
18652 root \_ sudo sleep 400
18653 root \_ sleep 400
{noformat}
Whereas {{/bin/bash -c "sudo sleep 400"}} doesn't use a subshell:
{noformat}
$ ps afo pid,user,cmd
PID USER CMD
5320 jaybuff -bash
19805 root \_ sudo sleep 400
19806 root \_ sleep 400
{noformat}
The problem with this is that when the executor goes to kill the task it sends
a SIGTERM to the pid it forked. If bash didn't use a subshell and it exec'ed
sudo then that process is running as root and the SIGTERM comes from an
unprivileged user, so the SIGTERM is ignored. I suspect after some timeout the
executor is killed by mesos-slave and the FINALIZING state is never reached.
There doesn't look to be a command line flag to tell bash to always use a
subshell. The best option I could find was the bash manual says "Placing a
list of commands between parentheses causes a subshell environment to be
created" See
http://www.gnu.org/software/bash/manual/bashref.html#Command-Grouping
--
This message was sent by Atlassian JIRA
(v6.2#6252)