At risk of sounding clueless, I'll give it a shot. On Sat, Mar 6, 2010 at 8:30 AM, Gabriel Campbell <gabcam at vodafone.com.mt>wrote:
> Hi > I got a really odd situation .... was wondering if any1 came across > it. > The cpu time for the sched process is equal to the Virtual Machine uptime > [ESX3.5] ... > so even if i reboot the OS the ps -ef | grep sched will show like 300 > minutes of cpu time. > Which will match exactly the lifetime/uptime of the virtual machine > > Some1 familiar with the way cpu time is calculated would have a clue on how > this could be possible and point me in the right direction. > ps(8) is getting the information about process "sched" from /proc/0/psinfo. The time shown is the the sum of user+system time from the psinfo structure [1]. That time is computed in prsubrc.c [2] (line 2272) from the p_acct member of the proc_t structure [3]. Solaris Internals page 903 says there is no tick processing for operating system kernel threads, so all the user/system time values are not incremented (see below). > > A problem exists .... and i belive its related somehow to this ubnormality > .... > this is the only machine which shows the cpu time for the sched process > higher then a few seconds. (27 typically) > Sched process cpu time in ps -ef | grep sched is always equal to the uptime > of the virtual environment ..... which makes no sense ... > What i mean is once i reboot the OS the sched process CPU time cant be > higher then the OS uptime !! > Please send the results for the following commands. ps -ef | grep sched uptime # mdb -k > ::ps !grep sched R 0 0 0 0 0 0x00000001 fec21dd8 sched > fec21dd8::print proc_t p_stime p_stime = 0x7e > fec21dd8::print proc_t p_utime p_utime = 0 > fec21dd8::print proc_t p_acct p_acct = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] If you print this information over and over for "sched", it will not change. If you print it for another user process, you'll see p_stime incrementing. > The core problem is that this node is part of a 4 node cluster which has > been happily running reliably for almost a year ... > suddenly this one node panics almost every 3 hours ... the panic reason is > a pm_tick delay EQUAL to the virtual machine uptime > ..... so both the sched process cpu TIME and the pm_tick delay are equal > ..... thus im assuming somehow related .... to a common root > problem/anomaly. > Probably because they approach zero. Can you send the whole panic message ? > > Anything on how or what solaris is reading to end up calculating the cpu > time ONLY for the sched process to be identical to the Virtual Machine > (ESX3.5) > uptime rather then the OS uptime ? > [1] - http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/procfs.h [2] - http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/proc/prsubr.c [3] - http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/proc.h -- Giovanni Tirloni sysdroid.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20100306/abbee13a/attachment.html>