Thank you for your reply! I went back and re-read your original post,and I agree with your original assumtion that there appears to be an issue during high CPU utilization.
All of our guests are running the kernel timer patch and this if the first time I have seen this, but it may be something to do with high utilization over time.... ------------------------------------------- Jeremy Warren remove the N0Sp@M_ to email me directly mailto:N0Sp@[EMAIL PROTECTED] |---------+----------------------------> | | "Sivey,Lonny" | | | <[EMAIL PROTECTED]>| | | Sent by: Linux on| | | 390 Port | | | <[EMAIL PROTECTED]| | | IST.EDU> | | | | | | | | | 01/16/2003 04:57 | | | PM | | | Please respond to| | | Linux on 390 Port| | | | |---------+----------------------------> >----------------------------------------------------------------------------------------------------------------------------------| | | | To: [EMAIL PROTECTED] | | cc: | | Subject: Re: [LINUX-390] Massive Time Shift | >----------------------------------------------------------------------------------------------------------------------------------| Jeremy, I had a very similar problem that occurred every night on my syslog server. I posted the problem here a few months ago and no one responded with any ideas. What solved it for me was to reinstall a version of the kernel without the timer patch enabled. After that the problem was gone. I think there's something not quite right with the timer patch. I have 132 other Linux images running on the same z/VM LPAR with the timer patch enabled. On those the timer patch is enabled and I have not seen the problem of the massive time shift. Hope this helps, Lonny _____________________________________ Lonny Sivey System Support Division OCLC Online Computer Library Center, Inc. 6565 Frantz Rd, Dublin, OH 43017 (614) 764-6013 FAX (614) 718-7200 mailto:[EMAIL PROTECTED] _____________________________________ -----Original Message----- From: Jeremy Warren [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 16, 2003 3:31 PM To: [EMAIL PROTECTED] Subject: Massive Time Shift Hello, I am running SLES 7 (2.4.7 kernel) under z/VM 4.2. Last night we had a script go into a tight loop. - This was bad... What was worse, is that the guest involved seems to have had a massive shift in its clock as a result of this. I was notified since several scripts scheduled to run via CRON did not in fact run. When I got in and checked, the guest running the script, was 7 hours behind (8:30AM Real Time was around 1:30AM Guest Time) hence why cron didn't do anything, since it wasn't time to yet. We have a process that runs top and dumps it to a file every 1 minute. The script in question kicked off at 1 minute after midnight. It appears to have begun looping at around 12:13am everything in the top log looks reasonably normal up to that point. 12:13am up 29 days, 23:47, 0 users, load average: 1.07, 0.97, 0.56 143 processes: 141 sleeping, 2 running, 0 zombie, 0 stopped CPU states: 0.6% user, 0.4% system, 0.0% nice, 1.4% idle CPU0 states: 69.1% user, 30.0% system, 0.0% nice, 0.0% idle CPU1 states: 30.0% user, 16.1% system, 0.0% nice, 52.1% idle Mem: 254092K av, 250804K used, 3288K free, 0K shrd, 3884K buff Swap: 179260K av, 41080K used, 138180K free 54032K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 13385 prodlnx 11 0 9672 9672 5000 R 97.5 3.8 11:59 /usr/bin/perl -w The very next entry in the top log was this: 1:29am up 30 days, 8:07, 0 users, load average: 5.92, 2.88, 2.26 190 processes: 182 sleeping, 6 running, 2 zombie, 0 stopped CPU states: 0.2% user, 0.4% system, 0.0% nice, 0.8% idle CPU0 states: 22.0% user, 71.1% system, 0.0% nice, 6.3% idle CPU1 states: 36.4% user, 61.0% system, 0.2% nice, 2.2% idle Mem: 254092K av, 252036K used, 2056K free, 0K shrd, 2680K buff Swap: 179260K av, 41080K used, 138180K free 51480K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 13385 prodlnx 20 0 9672 9672 5000 R 43.7 3.8 511:37 /usr/bin/perl -w A couple of things leap out at me from this: 1.) Box THINKS it 1:29AM - but really it's around 8:30AM ish. NOTHING seems to have occurred in the interim except our loop, there are no entries in any logs (messages, etc), nothing. 2.) The run time for the perl script is about correct: 511:37 = 8hours 31min or so. The script starts at 1 min after midnight, so SOMETHING is keeping the time right?? We have been running NTP for quite some time to correct time drift errors, however it doesn't seem to have done any good in this situation. (From another thread on a similar issue, I understand that NTP doesn't cope with massive time errors too well and that may be why it didn't help) In case anyone wants to see if they can duplicate the issue, this is the chunk of code in question. It was written in perl, and had 20-30 records on the stack, and the bad record was aprox number 4 or 5 in the queue. The printPopDetail function simply wrote out these records to a flat file. if ($#transfer > -1) { #If we have a transfer in storage @transfer = reverse @transfer; $poptransfer = pop(@transfer); doPopHeader($poptransfer); while ($#transfer > -1) { if ((!defined($poptransfer->[8])) || ($poptransfer->[8] eq 'I') || ($poptransfer->[8] eq 'D')) { $poptransfer = pop(@transfer); printPopDetail($poptransfer); } } } As you can see quite clearly this causes very bad things to happen when $poptransfer[8] is not equal to I or D. PLEASE, NO COMMENTS ON THE CODE, I didn't write it, and the folks who did have added the ELSE statement already to fix the loop. Thanks to all who lasted and actually read all of this mess! ------------------------------------------- Jeremy Warren remove the N0Sp@M_ to email me directly mailto:N0Sp@[EMAIL PROTECTED]
