Hello,

I am running SLES 7 (2.4.7 kernel) under z/VM 4.2.  Last night we had a
script go into a tight loop. - This was bad...

What was worse, is that the guest involved seems to have had a massive
shift in its clock as a result of this.  I was notified since several
scripts scheduled to run via CRON did not in fact run.  When I got in and
checked, the guest running the script, was 7 hours behind (8:30AM Real Time
was around 1:30AM Guest Time) hence why cron didn't do anything, since it
wasn't time to yet.

We have a process that runs top and dumps it to a file every 1 minute.  The
script in question kicked off at 1 minute after midnight.  It appears to
have begun looping at around 12:13am everything in the top log looks
reasonably normal up to that point.

12:13am  up 29 days, 23:47,  0 users,  load average: 1.07, 0.97, 0.56
143 processes: 141 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  0.6% user,  0.4% system,  0.0% nice,  1.4% idle
CPU0 states: 69.1% user, 30.0% system,  0.0% nice,  0.0% idle
CPU1 states: 30.0% user, 16.1% system,  0.0% nice, 52.1% idle
Mem:   254092K av,  250804K used,    3288K free,       0K shrd,    3884K
buff
Swap:  179260K av,   41080K used,  138180K free                   54032K
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
13385 prodlnx   11   0  9672 9672  5000 R    97.5  3.8  11:59 /usr/bin/perl
-w

The very next entry in the top log was this:

1:29am  up 30 days,  8:07,  0 users,  load average: 5.92, 2.88, 2.26
190 processes: 182 sleeping, 6 running, 2 zombie, 0 stopped
CPU states:  0.2% user,  0.4% system,  0.0% nice,  0.8% idle
CPU0 states: 22.0% user, 71.1% system,  0.0% nice,  6.3% idle
CPU1 states: 36.4% user, 61.0% system,  0.2% nice,  2.2% idle
Mem:   254092K av,  252036K used,    2056K free,       0K shrd,    2680K
buff
Swap:  179260K av,   41080K used,  138180K free                   51480K
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
13385 prodlnx   20   0  9672 9672  5000 R    43.7  3.8 511:37 /usr/bin/perl
-w

A couple of things leap out at me from this:
1.) Box THINKS it 1:29AM - but really it's around 8:30AM ish.  NOTHING
seems to have occurred in the interim except our loop, there are no entries
in any logs (messages, etc), nothing.
2.) The run time for the perl script is about correct: 511:37 = 8hours
31min or so.  The script starts at 1 min after midnight, so SOMETHING is
keeping the time right??

We have been running NTP for quite some time to correct time drift errors,
however it doesn't seem to have done any good in this situation. (From
another thread on a similar issue, I understand that NTP doesn't cope with
massive time errors too well and that may be why it didn't help)

In case anyone wants to see if they can duplicate the issue, this is the
chunk of code in question.   It was written in perl, and had 20-30 records
on the stack, and the bad record was aprox number 4 or 5 in the queue.  The
printPopDetail function simply wrote out these records to a flat file.

if ($#transfer > -1) {          #If we have a transfer in storage
      @transfer = reverse @transfer;
      $poptransfer = pop(@transfer);
      doPopHeader($poptransfer);
      while ($#transfer > -1) {
            if ((!defined($poptransfer->[8])) || ($poptransfer->[8] eq 'I')
|| ($poptransfer->[8] eq 'D')) {
                  $poptransfer = pop(@transfer);
                  printPopDetail($poptransfer);
            }
      }
}

As you can see quite clearly this causes very bad things to happen when
$poptransfer[8] is not equal to I or D.

PLEASE, NO COMMENTS ON THE CODE, I didn't write it, and the folks who did
have added the ELSE statement already to fix the loop.

Thanks to all who lasted and actually read all of this mess!

-------------------------------------------
Jeremy Warren
remove the N0Sp@M_ to email me directly
mailto:N0Sp@[EMAIL PROTECTED]

Reply via email to