Re: [MTT devel] [MTT svn] GIT: MTT branch master updated. 016088f2a0831b32ab5fd6f60f4cabe67e92e594

Mike Dubman Mon, 23 Jun 2014 04:37:02 -0400 (EDT)

this commit does more harm then good.
we experience following:

- some child processes still running after timeout and mtt killed the job.


before this commit - it worked fine.
please revert and test more.



On Sat, Jun 21, 2014 at 3:30 PM, MPI Team <[email protected]> wrote:

> The branch, master has been updated
>        via  016088f2a0831b32ab5fd6f60f4cabe67e92e594 (commit)
>        via  7fb4c6a4c9d71be127ea53bd463178510577f71f (commit)
>        via  381ba177d835a54c3197d846f5a4edfc314efe27 (commit)
>        via  cfdd29de2012eeb7592706f00dd07a52dd48cf6b (commit)
>        via  940030ca20eb1eaf256e898b83866c1cb83aca5c (commit)
>       from  c99ed7c7b159a2cab58a251bd7c0dad8972ff901 (commit)
>
> Those revisions listed above that are new to this repository have
> not appeared on any other notification email; so we list those
> revisions in full, below.
>
> - Log -----------------------------------------------------------------
>
> https://github.com/open-mpi/mtt/commit/016088f2a0831b32ab5fd6f60f4cabe67e92e594
>
> commit 016088f2a0831b32ab5fd6f60f4cabe67e92e594
> Author: Jeff Squyres <[email protected]>
> Date:   Sat Jun 21 04:58:45 2014 -0700
>
>     DoCommand: several fixes to kill_proc logic
>
>     1. Fix the kill(0, $pid) test to see if the process was still alive.
>
>     2. Rename _kill_proc() to _kill_proc_tree() to indicate that it's
>     really killing not only the PID in question, but also all of its
>     descendants.
>
>     3. In _kill_proc_tree(), change the order to kill the main PID first,
>     and ''then'' kill all the descendants.
>
>     The main use case is when killing mpirun: if we kill mpirun's
>     descendants ''first'', mpirun will detect its childrens' deaths and
>     then cleanup and exit.  Later, when MTT finally gets around to killing
>     mpirun, MTT will detect that mpirun is already dead and therefore emit
>     a confusing "mpirun died right at end of timeout" message.  This is
>     misleading at best; it doesn't indicate what actually happened.
>
>     However, if we kill mpirun first, it will take care of killing all of
>     its descendants.  MTT will therefore emit the right messages about
>     killing mpirun.  MTT will then redundantly try to kill a bunch of
>     now-nonexistent descendant processes of mpirun, but that's ok/safe.
>     We actually ''want'' this try-to-kill-mpirun's-descendants behavior to
>     handle the case when mpirun is misbehaving / not cleaning up its
>     descendants.
>
>     4. DoCommand() is used for more than launching mpirun, so pass down
>     $argv0 so that we can print the actual command name that is being
>     killed in various Verbose/Debug messages, not the hard-coded "mpirun"
>     string (which, in practice, was probably almost always correct, but
>     still...).
> ---
>  lib/MTT/DoCommand.pm | 78
> ++++++++++++++++++++++++++++++++++++----------------
>  1 file changed, 55 insertions(+), 23 deletions(-)
>
> diff --git a/lib/MTT/DoCommand.pm b/lib/MTT/DoCommand.pm
> index 02cdb94..646ca31 100644
> --- a/lib/MTT/DoCommand.pm
> +++ b/lib/MTT/DoCommand.pm
> @@ -2,7 +2,7 @@
>  #
>  # Copyright (c) 2005-2006 The Trustees of Indiana University.
>  #                         All rights reserved.
> -# Copyright (c) 2006-2013 Cisco Systems, Inc.  All rights reserved.
> +# Copyright (c) 2006-2014 Cisco Systems, Inc.  All rights reserved.
>  # Copyright (c) 2007-2008 Sun Microsystems, Inc.  All rights reserved.
>  # Copyright (c) 2007-2012 High Performance Computing Center Stuttgart,
>  #                         University of Stuttgart.  All rights reserved.
> @@ -57,23 +57,27 @@ sub DoCommand {
>      ($time_arg, $no_execute) = @_;
>  }
>
> +# This function is called for killing both mpirun and each of its
> +# descendants.  We really only want to see verbose output for when we
> +# kill mpirun itself, so only show output when the caller provides us
> +# with a $argv0 value.
>  sub _kill_proc_one {
> -    my ($pid) = @_;
> +    my ($pid, $argv0) = @_;
>
>      # How long to wait after each kill()
>      my $wait_time = 5;
>
>      # See if the proc is alive first
> -    my $kid;
> -    kill(0, $pid);
> -    $kid = waitpid($pid, WNOHANG);
> -    return "mpirun died right at end of timeout (MTT did not have to kill
> it)"
> -        if (-1 == $kid);
> +    my $num_alive = kill(0, $pid);
> +    return "$argv0 died right at end of timeout (MTT did not have to kill
> it)"
> +        if (0 == $num_alive);
>
>      # Try an easy kill
>      kill("TERM", $pid);
> -    Verbose("*** Killing mpirun with SIGTERM\n");
> +    Verbose("*** Killing $argv0 with SIGTERM\n")
> +        if (defined($argv0));
>      # Give mpirun some time to cleanup before we try to reap it.
> +    my $kid;
>      my $i = $wait_time;
>      while ($i > 0) {
>          sleep(1);
> @@ -85,45 +89,74 @@ sub _kill_proc_one {
>          # process no longer exists (i.e., we get -1 back from
>          # waitpid), or we successfully killed it (i.e., we got the PID
>          # back from waitpid).
> -        return "MTT killed mpirun via SIGTERM" if (0 != $kid);
> +        return "MTT killed $argv0 via SIGTERM" if (0 != $kid);
>
>          --$i;
>      }
> -    Verbose("** Kill TERM (after $wait_time seconds) didn't work!\n");
> +    Verbose("** Kill TERM (after $wait_time seconds) didn't work!\n")
> +        if (defined($argv0));
>
>      # That didn't work either.  Try SIGINT;
> -    Verbose("*** Killing mpirun with SIGINT\n");
> +    Verbose("*** Killing $argv0 with SIGINT\n")
> +        if (defined($argv0));
>      kill("INT", $pid);
>      my $i = $wait_time;
>      while ($i > 0) {
>          sleep(1);
>          $kid = waitpid($pid, WNOHANG);
> -        return "MTT killed mpirun via SIGINT" if (0 != $kid);
> +        return "MTT killed $argv0 via SIGINT" if (0 != $kid);
>          --$i;
>      }
> -    Verbose("** Kill INT (after $wait_time seconds) didn't work!\n");
> +    Verbose("** Kill INT (after $wait_time seconds) didn't work!\n")
> +        if (defined($argv0));
>
>      # Ok, now we're mad.  Be violent.
> -    Verbose("*** Now I'm mad.  Killing mpirun with SIGKILL\n");
> +    Verbose("*** Now I'm mad.  Killing $argv0 with SIGKILL\n")
> +        if (defined($argv0));
>      my $count = 0;
>      while (1) {
>          kill("KILL", $pid);
>          ++$count;
>          $kid = waitpid($pid, WNOHANG);
> -        return "MTT killed mpirun via $count SIGKILLs" if (0 != $kid);
> -        Verbose("** Kill KILL didn't work!  Sleeping and trying
> again...\n");
> +        return "MTT killed $argv0 via $count SIGKILLs" if (0 != $kid);
> +        Verbose("** Kill KILL didn't work!  Sleeping and trying
> again...\n")
> +            if (defined($argv0));
>          sleep(1);
>      }
>  }
>
>
> -sub _kill_proc {
> -    my ($pid) = @_;
> -    # kill the group, take the names later
> -    foreach my $process (descendant_processes($pid)) {
> +sub _kill_proc_tree {
> +    my ($pid, $argv0) = @_;
> +
> +    # Find all descendent processes of the main PID
> +    my @children = descendant_processes($pid);
> +
> +    # Kill the main PID.  Note that _kill_proc_one() will give the
> +    # process time to react/cleanup, so there's no need for an
> +    # additional delay after it returns.
> +    my $ret = _kill_proc_one($pid, $argv0);
> +
> +    # Now kill all the processes that descended from the base PID.
> +    #
> +    # Note: when mpirun is working properly (which is one of the
> +    # biggest use cases for DoCommand), this is redundant -- all the
> +    # children will already be dead (because killing mpirun will
> +    # ensure that all descendant processes are also killed).
> +    #
> +    # That being said, a) DoCommand() is used to launch more than just
> +    # mpirun, b) mpirun breaks sometimes and doesn't clean up its
> +    # descendants, and c) it's safe to call _kill_proc_one() on a PID
> +    # that is already dead.
> +    foreach my $process (@children) {
> +        Debug("=== killing child proc: $process->{pid},
> $process->{argv0}\n");
> +
> +        # Ignore the return.  For example, we don't care if the child
> +        # is already dead
>          _kill_proc_one($process->{pid});
>      }
> -    return _kill_proc_one($pid);
> +
> +    return $ret;
>  }
>
>
>  #--------------------------------------------------------------------------
> @@ -338,7 +371,6 @@ sub Cmd {
>
>      my $tokens = _quote_escape($cmd);
>
> -
>      my $pid;
>      if (! $no_execute) {
>
> @@ -568,7 +600,7 @@ sub Cmd {
>
>                      $done = 0;
>                  }
> -                $timeout_message = _kill_proc($pid);
> +                $timeout_message = _kill_proc_tree($pid, ${$tokens}[0]);
>
>                  # We don't care about the exit status if we timed out
>                  # -- fill it with a bogus value.
>
>
> https://github.com/open-mpi/mtt/commit/7fb4c6a4c9d71be127ea53bd463178510577f71f
>
> commit 7fb4c6a4c9d71be127ea53bd463178510577f71f
> Author: Jeff Squyres <[email protected]>
> Date:   Sat Jun 21 04:42:19 2014 -0700
>
>     DoCommand: replace obscure perl code with easier-to-grok version
>
>     The descendant_processes() and flatten() codes were written in a
>     nicely efficient yet very difficult to understand manner.  This commit
>     replaces these two subs with admittedly less efficient but much easier
>     to understand code.
>
>     The new code also saves the first token of the command for each PID so
>     that it can be used in Debug/Verbose statements (e.g., identify the
>     command that is being killed, etc.).
> ---
>  lib/MTT/DoCommand.pm | 63
> ++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 49 insertions(+), 14 deletions(-)
>
> diff --git a/lib/MTT/DoCommand.pm b/lib/MTT/DoCommand.pm
> index c3be8a8..02cdb94 100644
> --- a/lib/MTT/DoCommand.pm
> +++ b/lib/MTT/DoCommand.pm
> @@ -120,8 +120,8 @@ sub _kill_proc_one {
>  sub _kill_proc {
>      my ($pid) = @_;
>      # kill the group, take the names later
> -    foreach my $p (descendant_processes($pid)) {
> -        _kill_proc_one($p);
> +    foreach my $process (descendant_processes($pid)) {
> +        _kill_proc_one($process->{pid});
>      }
>      return _kill_proc_one($pid);
>  }
> @@ -527,7 +527,8 @@ sub Cmd {
>                  if (defined($timeout_backtrace_program) and
> !$got_backtrace) {
>                      $backtrace = "";
>                      if ( $timeout_before_backtrace_program ) {
> -                        foreach my $p (descendant_processes($pid)) {
> +                        foreach my $process (descendant_processes($pid)) {
> +                            my $p = $process->{pid};
>                              my $c = $timeout_before_backtrace_program;
>                              $c =~ s/%PID%/$p/g;
>                              Debug("_pre_backtrace cmd: $c\n");
> @@ -538,7 +539,8 @@ sub Cmd {
>                      $backtrace .=
> _get_backtrace($timeout_backtrace_program, $pid, $pre_pernode);
>
>                      if ( $timeout_after_backtrace_program ) {
> -                        foreach my $p (descendant_processes($pid)) {
> +                        foreach my $process (descendant_processes($pid)) {
> +                            my $p = $process->{pid};
>                              my $c = $timeout_after_backtrace_program;
>                              $c =~ s/%PID%/$p/g;
>                              Debug("_post_backtrace cmd: $c\n");
> @@ -738,19 +740,50 @@ sub _do_email_timeout_notification {
>      close(TIMEOUT_SENTINEL_FILE);
>  }
>
> -sub flatten {
> -    map{ (ref($_) eq "ARRAY") ? map{flatten($_)}@$_ : $_ } @_;
> +sub find_children {
> +    my ($pids, $start) = @_;
> +
> +    my @ret;
> +    # Scan the PID list and see if any other PIDs list $start as their
> +    # PPID.  If so, save that PID and then recursively look for *that*
> +    # PID's children.
> +    foreach my $pid (sort(keys(%{$pids}))) {
> +        if ($pids->{$pid}->{ppid} == $start) {
> +            push(@ret, {
> +                pid => $pid,
> +                argv0 => $pids->{$pid}->{argv0},
> +                 });
> +
> +            # Add this PID's descendants to the return array
> +            push(@ret, find_children($pids, $pid));
> +        }
> +    }
> +
> +    return @ret;
>  }
>
>  sub descendant_processes {
>      my ($base) = (@_, $$);
> -    my %parentage = (`ps -eo pid,ppid` =~ /\d+/g);
> -    my %reverse = map { ($_, [$_]) } %parentage;
> -    while (my ($pid,$ppid) = each %parentage){
> -        push @{$reverse{$ppid}}, $reverse{$pid};
> +    open(PS, "ps -eo pid,ppid,cmd|") || die "Can't open ps";
> +
> +    # Skip the title line
> +    <PS>;
> +
> +    # Read all the data lines
> +    my $pids;
> +    while (<PS>) {
> +        # Grab the PID, PPID, and first token of the command
> +        $_ =~ m/(\d+)\s+(\d+)\s+(\S+)/;
> +        $pids->{$1} = {
> +            pid => $1,
> +            ppid => $2,
> +            argv0 => $3,
> +        };
>      }
> -    shift @{$reverse{$base}};
> -    flatten($reverse{$base});
> +    close(PS);
> +
> +    # Find all the descendants of the $base PID
> +    return find_children($pids, $base);
>  }
>
>  sub _get_backtrace {
> @@ -774,7 +807,8 @@ sub _get_backtrace {
>
>              # Use ps -Af output to fetch the child pids,
>              # and grab a stack trace from each one
> -            foreach my $p (descendant_processes($pid)) {
> +            foreach my $process (descendant_processes($pid)) {
> +                my $p = $process->{pid};
>                  $gdb_cmd = "gdb - $p --command=$gdb_command_filename
> --batch";
>                  $ret .= "\n $gdb_cmd";
>                  $ret .= "\n" . `$gdb_cmd`;
> @@ -814,8 +848,9 @@ sub _get_backtrace {
>                  }
>              }
>              Debug("Stacktrace: base name $return_basename\n");
> -            #foreach my $p (descendant_processes($pid))
> +            #foreach my $process (descendant_processes($pid))
>              #{
> +            #    my $p = $process->{pid};
>              #    my $gstack_cmd = "gstack $p";
>              #    $ret .= "\n $gstack_cmd";
>              #    $ret .= "\n" . `$gstack_cmd`;
>
>
> https://github.com/open-mpi/mtt/commit/381ba177d835a54c3197d846f5a4edfc314efe27
>
> commit 381ba177d835a54c3197d846f5a4edfc314efe27
> Author: Jeff Squyres <[email protected]>
> Date:   Sat Jun 21 03:48:55 2014 -0700
>
>     DoCommand: ensure to only kill the process once
>
>     When the DoCommand timeout fires, MTT will kill the entire process
>     tree of the launched command.  However, the stdout/stderr sockets may
>     still be draining, and therefore the main DoCommand loop may still
>     execute more iterations.  In cases like these, ensure that we don't
>     try to backtrace/kill the process tree again.
>
>     Also add a few more Debug statements for detailed output.
> ---
>  lib/MTT/DoCommand.pm | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/lib/MTT/DoCommand.pm b/lib/MTT/DoCommand.pm
> index 0e253f0..c3be8a8 100644
> --- a/lib/MTT/DoCommand.pm
> +++ b/lib/MTT/DoCommand.pm
> @@ -471,6 +471,7 @@ sub Cmd {
>              my $len = sysread(OUTread, $data, 99999);
>              if (0 == $len) {
>                  vec($rin, fileno(OUTread), 1) = 0;
> +                Debug("*** Child process stdout closed\n");
>                  --$done;
>              } else {
>                  _append($data, $print_timestamp ? localtime() : "",
> @@ -485,6 +486,7 @@ sub Cmd {
>              my $len = sysread(ERRread, $data, 99999);
>              if (0 == $len) {
>                  vec($rin, fileno(ERRread), 1) = 0;
> +                Debug("*** Child process stderr closed\n");
>                  --$done;
>              } else {
>                  _append($data, $print_timestamp ? localtime() : "",
> @@ -493,11 +495,19 @@ sub Cmd {
>              }
>          }
>
> -        # If we're running with a timeout, bail if we're past the end
> -        # time
> -        if (defined($end_time) && time() > $end_time) {
> +        # If we're running with a timeout, check to see if a) the
> +        # process is still running (i.e., stdout/stderr is still
> +        # open), and b) we're past the end time.
> +        if ($done > 0 && defined($end_time) && time() > $end_time) {
>              my $over = time() - $end_time;
> -            if ($over > $last_over) {
> +
> +            # Note that we only want to backtrace/kill the process
> +            # *once*.  Consider: it is possible that we kill the
> +            # process tree, but then stdout/stderr are still draining,
> +            # and therefore we loop around here again *even though the
> +            # process tree is already dead*.  So put a little
> +            # do-this-only-once protection in here.
> +            if (!defined($timeout_message) && $over > $last_over) {
>                  Verbose("*** Past timeout of $timeout seconds by $over
> seconds\n");
>
>                  # Handle timeout file
> @@ -555,10 +565,8 @@ sub Cmd {
>                      );
>
>                      $done = 0;
> -                    $timeout_message = _kill_proc($pid);
> -                } else {
> -                    $timeout_message = _kill_proc($pid);
>                  }
> +                $timeout_message = _kill_proc($pid);
>
>                  # We don't care about the exit status if we timed out
>                  # -- fill it with a bogus value.
> @@ -576,6 +584,7 @@ sub Cmd {
>              }
>          }
>      }
> +    Debug("*** Child process now dead\n");
>      close OUTerr;
>      close OUTread
>          if (!$merge_output);
>
>
> https://github.com/open-mpi/mtt/commit/cfdd29de2012eeb7592706f00dd07a52dd48cf6b
>
> commit cfdd29de2012eeb7592706f00dd07a52dd48cf6b
> Author: Jeff Squyres <[email protected]>
> Date:   Sat Jun 21 03:36:08 2014 -0700
>
>     DoCommand: remove unused variables
> ---
>  lib/MTT/DoCommand.pm | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/lib/MTT/DoCommand.pm b/lib/MTT/DoCommand.pm
> index 0740b28..0e253f0 100644
> --- a/lib/MTT/DoCommand.pm
> +++ b/lib/MTT/DoCommand.pm
> @@ -445,7 +445,6 @@ sub Cmd {
>          $end_time = time() + $timeout;
>          Debug("Timeout: $timeout - $end_time (vs. now: " . time() .
> ")\n");
>      }
> -    my $killed_status = undef;
>      my $last_over = 0;
>      my $partial_out;
>      my $partial_err;
> @@ -900,7 +899,6 @@ sub Win_Cmd {
>          $end_time = time() + $timeout;
>          Debug("Timeout: $timeout - $end_time (vs. now: " . time() .
> ")\n");
>      }
> -    my $killed_status = undef;
>      my $last_over = 0;
>
>      while(<OUTread>) {
>
>
> https://github.com/open-mpi/mtt/commit/940030ca20eb1eaf256e898b83866c1cb83aca5c
>
> commit 940030ca20eb1eaf256e898b83866c1cb83aca5c
> Author: Jeff Squyres <[email protected]>
> Date:   Fri Jun 20 17:13:08 2014 -0700
>
>     Diskfree.pm: fix df handle leak
>
>     This left a zombied "df" process in the process table.  The Perl
>     runtime eventually cleaned it up, but it was weird to see a zombied df
>     in ps listings.
>
>     There's no point in reporting this bug upstream; according to the
>     Changes file, v0.06 was published in 1998.  There's been no activity
>     on http://search.cpan.org/dist/Filesys-DiskFree/DiskFree.pm since
>     v0.06.
> ---
>  lib/Filesys/DiskFree.pm | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/Filesys/DiskFree.pm b/lib/Filesys/DiskFree.pm
> index 1fa655d..41a27d7 100644
> --- a/lib/Filesys/DiskFree.pm
> +++ b/lib/Filesys/DiskFree.pm
> @@ -124,8 +124,9 @@ sub df(){
>      $cmd=$self->command() or
>         croak "No df command known for format ".$self->{'FORMAT'};
>      open(HANDLE,"$cmd|") or croak("Cannot fork \"$cmd\": $!");
> -    return $self->load(\*HANDLE);
> +    my $ret = $self->load(\*HANDLE);
>      close(HANDLE) or croak("Cannot df $!");
> +    return $ret;
>  }
>
>  sub load()  {
>
> -----------------------------------------------------------------------
>
> Summary of changes:
>  lib/Filesys/DiskFree.pm |   3 +-
>  lib/MTT/DoCommand.pm    | 162
> +++++++++++++++++++++++++++++++++++-------------
>  2 files changed, 120 insertions(+), 45 deletions(-)
>
>
> hooks/post-receive
> --
> MTT
> _______________________________________________
> mtt-svn mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-svn
>

Re: [MTT devel] [MTT svn] GIT: MTT branch master updated. 016088f2a0831b32ab5fd6f60f4cabe67e92e594

Reply via email to