Jay Buffington created AURORA-367:
-------------------------------------

             Summary: simple commands using sudo do not go through FINALIZING 
state
                 Key: AURORA-367
                 URL: https://issues.apache.org/jira/browse/AURORA-367
             Project: Aurora
          Issue Type: Bug
          Components: Executor
            Reporter: Jay Buffington
            Priority: Minor


The below .aurora file causes the FINALIZING state to be skipped.  It goes
through these states:

{noformat}
    ACTIVE   04/28 18:33:29
    ACTIVE   04/28 18:35:51
    CLEANING     04/28 18:35:51
    KILLED   04/28 18:36:51
{noformat}


Here's the definition of the job:
{noformat}
    $ cat fail_finalize.aurora
    jobs = [Job(
      task=SimpleTask(name="fail_finalize", command="""
         sudo sleep 600
    """),
      role='jaybuff',
      environment="prod",
      cluster="vp21d01cp")]
{noformat}
{noformat}
    $ aurora inspect  vp21d01cp/jaybuff/prod/fail_finalize fail_finalize.aurora
    Job level information
      name:       fail_finalize
      role:       jaybuff
      contact:    <class 'pystachio.composite.Empty'>
      cluster:    vp21d01cp
      instances:  1
      service:    False
      production: False

    Task level information
      name: fail_finalize

    Process fail_finalize:
      cmdline:

             sudo sleep 600

    $
{noformat}

It looks like this is caused due to bash exec'ing "simple commands" rather than 
"fork+exec" like it does with a "complex command" like {{sudo sleep 600; 1}}  
To demonstrate what I'm talking about:

{{/bin/bash -c "(sudo sleep 400)"}} forces the command to run in a subshell 
(effectively doing a fork+exec), so we see this:

{noformat}
    $ ps afo pid,user,cmd
      PID USER     CMD
     5320 jaybuff  -bash
    18651 jaybuff   \_ /bin/bash -c (sudo sleep 400)
    18652 root          \_ sudo sleep 400
    18653 root              \_ sleep 400
{noformat}


Whereas {{/bin/bash -c "sudo sleep 400"}} doesn't use a subshell:

{noformat}
    $ ps afo pid,user,cmd
      PID USER     CMD
     5320 jaybuff  -bash
    19805 root      \_ sudo sleep 400
    19806 root          \_ sleep 400
{noformat}


The problem with this is that when the executor goes to kill the task it sends 
a SIGTERM to the pid it forked.  If bash didn't use a subshell and it exec'ed 
sudo then that process is running as root and the SIGTERM comes from an 
unprivileged user, so the SIGTERM is ignored.  I suspect after some timeout the 
executor is killed by mesos-slave and the FINALIZING state is never reached.

There doesn't look to be a command line flag to tell bash to always use a 
subshell.  The best option I could find was the bash manual says "Placing a 
list of commands between parentheses causes a subshell environment to be 
created"   See 
http://www.gnu.org/software/bash/manual/bashref.html#Command-Grouping



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to