Mongers,

I recently encountered a puzzling dilemma. You might find it interesting, or obvious (probably not both) and it leads to a question about how perl handles signals.

Here's the situation:

I teach a class for Portland's adult education class. It's a PHP class. (boo, hiss). I created a web site on my own server where students can enter php code, submit it, have it executed and see the result in an output window. (Yes, the security implications are severe. But that's not what I'm writing about :-).

The CGI script that handles the code submission is a perl script. It creates a temp file, builds a command line with some PHP-native safety features and then hands the file to PHP to execute via qx().

Anyhow, this has worked well for years. Inevitably during our coverage of looping (i.e. with "while") students will write infinite loops. I thought I was handling these situations pretty well with a timeout mechanism based on perl's alarm(). Here is basically how it worked, to paraphrase:

                $SIG{'ALRM'} = sub {die "timeout"};

my $command = 'php -d safe_mode=on -d open_basedir=/home/user ' . $file;

                alarm 5;
                eval {
                        $output = qx($command);
                };
                if ($@) {
                        # tell user their process exceeded the time limit
                }

On more than one recent occasion I found this didn't work. (Cut here to say "I told you so.")

In researching the problem I isolated the conditions that led to the problem, and also showed that it's not specific to PHP. But I still don't fully understand the reason for it.

Here is a test program I wrote (with help from http://perlguru.com/gforum.cgi?post=40217 ) to emulate the behavior of the CGI script:

----------------------------

#!/usr/bin/perl

use Errno (ESRCH, EPERM);

print "Content-type: text/plain\n\n" if $ENV{'GATEWAY_INTERFACE'};

$SIG{'ALRM'} = sub {die "Throwing time_out"};

my $file = '/path/to/inf_loop.php';

my $command = 'php ' . $file;

my $pid = open(CMD, '-|'); # going to read child's STDOUT

defined ($pid) or die "Can't fork: $!";

if ($pid) { # parent
        alarm 5;
        eval {
                $output = join '', <CMD>;
        };
        if ($@) {
                print "Caught time_out\n";
                kill_pid($pid);
        }
}
else { # child
        exec($command) or die;
}

sub kill_pid {
        my $pid = shift;
        if (kill 0, $pid) {
                print "Killing $pid\n";
                kill 'KILL', $pid;
        }
        elsif ($! == EPERM) {
                print "I'm not allowed to signal $pid!\n";
        }
        elsif ($! == ESRCH) {
                print "$pid is deceased\n";
        }
        else {
                print "Unexpected error, can't kill $pid\n";
        }
}

----------------------------

The contents of the PHP file "inf_loop.php" varied as I tried different things.

TEST 1:

while (1) {
  $n++;
}

RESULT:

After 5 seconds, the browser received this response:

Caught time_out
Killing 7884

----------------------------

TEST 2:

while (1) {
  print $n++;
}

RESULT:

The browser spun its wheels indefinitely, waiting... while in "top" I see:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 8446 ftpbog1   20   0  192m 8640 5496 R   52  0.2   0:02.06 php
 8445 ftpbog1   20   0 38380  23m 1320 S   48  0.6   0:02.00 test2.pl
 8444 ftpbog1   20   0  237m  10m 2156 S    0  0.3   0:00.00 apache2

The test2.pl process is consuming massive amounts of memory very quickly and continues to run indefinitely. Finally I run...

jet:/home/bogart# sudo -u ftpbog1 kill 8446

... and almost simultaneously the response comes back to the browser:

Caught time_out
Killing 8446

This tells me that the failure to terminate after 5 seconds is not due to a permissions issue.

----------------------------

TEST 3:

For this test I see if a perl child "inf_loop.pl" would produce the same result, with (literally) the same code:

while (1) {
  print $n++;
}

RESULT: 5 out of 5 times the process is killed after 5 seconds, though not before consuming 30 MB of RAM.

----------------------------

TEST 4:

For this test I unbuffered STDOUT in "inf_loop.pl" with:

$| = 1;

RESULT: Now the result matches the result from the PHP file. The child perl process runs indefinitely until killed with:

jet:/home/bogart# sudo -u ftpbog1 kill 10190

... at which point the browser receives:

Caught time_out
Killing 10190

NOTE: All of the subsequent tests are run with the perl child unbuffered. The PHP child is unbuffered by default, presumably.

----------------------------

TEST 5:

Added a call to sleep().

while (1) {
  print $n++;
  sleep(1);
}

RESULT: Consistent kill after 5 seconds, for both the PHP child and the perl child.

----------------------------

TEST 6:

Print newlines:

while (1) {
  print $n++ . "\n";
}

RESULT: Both the PHP and perl children run indefinitely. Granted, the reader was trying to read to EOF all at once...

----------------------------

TEST 7:

Read in a loop:

while (<CMD>) {
        $output .= $_;
}

RESULT: Consistent kills for both children. They're printing newlines but not sleeping.

----------------------------

TEST 8:

Read in a loop, but don't have the children print newlines.

RESULT: As you'd suspect, they run indefinitely. The readline operator keeps gobbling up RAM as fast as it can.

----------------------------

TEST 9:

Use read():

1 while ($bytes = read(CMD, $output, 1024, $total += $bytes));

RESULT: Consistent kills. Neither child was sleeping or printing newlines. Both were attempting to print one unbuffered endless line as fast as possible.

----------------------------

TEST 10:

What if I give the child something else to do in the loop? I tried this several times with additional "extra work" each time. And I went back to line-based reading, but in a loop, i.e. "while (<CMD>) ..."

while (1) {
        print $n++;
        1;
}

RESULT: Testing just the perl process now, it runs indefinitely.

while (1) {
        print $n++;
        $m = rand();
}

RESULT: Indefinite run time.

while (1) {
        print $n++;
        $m = rand() * rand() * rand() * rand() * rand() / $n;
}

RESULT: Indefinite run time.

while (1) {
        print $n++;
        1 while $m++ % 100;
}

RESULT: Killed on the 1st, 3rd and 5th run. Ran indefinitely on the 2nd and 4th run.

while (1) {
        print $n++;
        my $m;
        $m++ until $m == $n;
}

RESULT: Killed 5 times out of 5

while (1) {
        print $n++;
        $m = $n;
        while ($m--) {
                $o = rand() * rand();
        }
}

RESULT: Killed 5 times out of 5

----------------------------

TEST 11:

Had the $SIG{'ALRM'} sub and the exception handler print the time at which they executed.

$SIG{'ALRM'} = sub {die "Throwing time_out at " . scalar(localtime())};

... and ...

if ($@) {
        print "Caught error ($@) at " . scalar(localtime()) . "\n";
        kill_pid($pid);
}

RESULT: I reverted the inf_loop.pl script to a known "infinite" personality. After running it for about 30 seconds I killed it from the command line and found that the times reported were simultaneous with the manual kill, long after the 5-second alarm signal handler should have been executed.

----------------------------

ANALYSIS:

There are some circumstances in which an alarm is unlikely to be handled by perl at the intended time, or soon after. (When I say the tests ran "indefinitely", it means I let them run for 30 seconds to a minute before killing them manually. The alarm was set for five seconds). The probability depends on a variety of factors, including:

- Unbuffered output. (Compare tests 3 and 4).
- Line-based reading for the parent. (See test 9).
- The child writing something. (See test 1).
- The child writing without newlines (Compare tests 7 and 8, note array context for <>). - The child doing something other than "print $n++", though it depends on how much. (See tests 5 and 10).

The problem is not that the child *can't* be signaled. It appears that the problem (see test 11) is that under certain circumstances perl doesn't execute the parent's $SIG{'ALRM'} handler when it should. The signal is "there", waiting, but it doesn't get through until the child exits.

QUESTIONS:

Does these results seem unusual or unexpected? What bugs me is that, for instance in test 10, the success of the alarm seems somewhat arbitrary. The fourth of the sub-tests under test 10 only ran indefinitely *some* of the time.

What is happening internally that could explain this? I don't know how alarm() is implemented under the hood. I've heard of "uninterruptable sleep" associated with disk I/O problems, but I didn't think it was related to RAM. (Swap is presumably not in use here).

Do any of you typically use a particular "defensive programming" tactic to avoid these kinds of issues? It seems that reading with "read ()" might be a good idea.

Thanks.

Bogart




_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to