Thank you for your reply!  I went back and re-read your original post,and I
agree with your original assumtion that there appears to be an issue during
high CPU utilization.

All of our guests are running the kernel timer patch and this if the first
time I have seen this, but it may be something to do with high utilization
over time....

-------------------------------------------
Jeremy Warren
remove the N0Sp@M_ to email me directly
mailto:N0Sp@[EMAIL PROTECTED]



|---------+---------------------------->
|         |           "Sivey,Lonny"    |
|         |           <[EMAIL PROTECTED]>|
|         |           Sent by: Linux on|
|         |           390 Port         |
|         |           <[EMAIL PROTECTED]|
|         |           IST.EDU>         |
|         |                            |
|         |                            |
|         |           01/16/2003 04:57 |
|         |           PM               |
|         |           Please respond to|
|         |           Linux on 390 Port|
|         |                            |
|---------+---------------------------->
  
>----------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                    
                                              |
  |       To:       [EMAIL PROTECTED]                                            
                                              |
  |       cc:                                                                          
                                              |
  |       Subject:  Re: [LINUX-390] Massive Time Shift                                 
                                              |
  
>----------------------------------------------------------------------------------------------------------------------------------|




Jeremy,

I had a very similar problem that occurred every night on my syslog server.
I posted the problem here a few months ago and no one responded with any
ideas.  What solved it for me was to reinstall a version of the kernel
without the timer patch enabled.  After that the problem was gone.  I think
there's something not quite right with the timer patch.  I have 132 other
Linux images running on the same z/VM LPAR with the timer patch enabled.
On
those the timer patch is enabled and I have not seen the problem of the
massive time shift.

Hope this helps,

Lonny

_____________________________________
Lonny Sivey
System Support Division
OCLC Online Computer Library Center, Inc.
6565 Frantz Rd, Dublin, OH 43017
(614) 764-6013  FAX (614) 718-7200
mailto:[EMAIL PROTECTED]
_____________________________________



-----Original Message-----
From: Jeremy Warren [mailto:[EMAIL PROTECTED]]
Sent: Thursday, January 16, 2003 3:31 PM
To: [EMAIL PROTECTED]
Subject: Massive Time Shift


Hello,

I am running SLES 7 (2.4.7 kernel) under z/VM 4.2.  Last night we had a
script go into a tight loop. - This was bad...

What was worse, is that the guest involved seems to have had a massive
shift in its clock as a result of this.  I was notified since several
scripts scheduled to run via CRON did not in fact run.  When I got in and
checked, the guest running the script, was 7 hours behind (8:30AM Real Time
was around 1:30AM Guest Time) hence why cron didn't do anything, since it
wasn't time to yet.

We have a process that runs top and dumps it to a file every 1 minute.  The
script in question kicked off at 1 minute after midnight.  It appears to
have begun looping at around 12:13am everything in the top log looks
reasonably normal up to that point.

12:13am  up 29 days, 23:47,  0 users,  load average: 1.07, 0.97, 0.56
143 processes: 141 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  0.6% user,  0.4% system,  0.0% nice,  1.4% idle
CPU0 states: 69.1% user, 30.0% system,  0.0% nice,  0.0% idle
CPU1 states: 30.0% user, 16.1% system,  0.0% nice, 52.1% idle
Mem:   254092K av,  250804K used,    3288K free,       0K shrd,    3884K
buff
Swap:  179260K av,   41080K used,  138180K free                   54032K
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
13385 prodlnx   11   0  9672 9672  5000 R    97.5  3.8  11:59 /usr/bin/perl
-w

The very next entry in the top log was this:

1:29am  up 30 days,  8:07,  0 users,  load average: 5.92, 2.88, 2.26
190 processes: 182 sleeping, 6 running, 2 zombie, 0 stopped
CPU states:  0.2% user,  0.4% system,  0.0% nice,  0.8% idle
CPU0 states: 22.0% user, 71.1% system,  0.0% nice,  6.3% idle
CPU1 states: 36.4% user, 61.0% system,  0.2% nice,  2.2% idle
Mem:   254092K av,  252036K used,    2056K free,       0K shrd,    2680K
buff
Swap:  179260K av,   41080K used,  138180K free                   51480K
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
13385 prodlnx   20   0  9672 9672  5000 R    43.7  3.8 511:37 /usr/bin/perl
-w

A couple of things leap out at me from this:
1.) Box THINKS it 1:29AM - but really it's around 8:30AM ish.  NOTHING
seems to have occurred in the interim except our loop, there are no entries
in any logs (messages, etc), nothing.
2.) The run time for the perl script is about correct: 511:37 = 8hours
31min or so.  The script starts at 1 min after midnight, so SOMETHING is
keeping the time right??

We have been running NTP for quite some time to correct time drift errors,
however it doesn't seem to have done any good in this situation. (From
another thread on a similar issue, I understand that NTP doesn't cope with
massive time errors too well and that may be why it didn't help)

In case anyone wants to see if they can duplicate the issue, this is the
chunk of code in question.   It was written in perl, and had 20-30 records
on the stack, and the bad record was aprox number 4 or 5 in the queue.  The
printPopDetail function simply wrote out these records to a flat file.

if ($#transfer > -1) {          #If we have a transfer in storage
      @transfer = reverse @transfer;
      $poptransfer = pop(@transfer);
      doPopHeader($poptransfer);
      while ($#transfer > -1) {
            if ((!defined($poptransfer->[8])) || ($poptransfer->[8] eq 'I')
|| ($poptransfer->[8] eq 'D')) {
                  $poptransfer = pop(@transfer);
                  printPopDetail($poptransfer);
            }
      }
}

As you can see quite clearly this causes very bad things to happen when
$poptransfer[8] is not equal to I or D.

PLEASE, NO COMMENTS ON THE CODE, I didn't write it, and the folks who did
have added the ELSE statement already to fix the loop.

Thanks to all who lasted and actually read all of this mess!

-------------------------------------------
Jeremy Warren
remove the N0Sp@M_ to email me directly
mailto:N0Sp@[EMAIL PROTECTED]

Reply via email to