At risk of sounding clueless, I'll give it a shot.

On Sat, Mar 6, 2010 at 8:30 AM, Gabriel Campbell <gabcam at 
vodafone.com.mt>wrote:

> Hi
>        I got a really odd situation .... was wondering if any1 came across
> it.
> The cpu time for the sched process is equal to the Virtual Machine uptime
> [ESX3.5] ...
> so even if i reboot the OS the ps -ef | grep sched will show like 300
> minutes of cpu time.
> Which will match exactly the lifetime/uptime of the virtual machine
>
> Some1 familiar with the way cpu time is calculated would have a clue on how
> this could be possible and point me in the right direction.
>


ps(8) is getting the information about process "sched" from /proc/0/psinfo.
The time shown is the the sum of user+system time from the psinfo structure
[1]. That time is computed in prsubrc.c [2] (line 2272) from the p_acct
member of the proc_t structure [3].

Solaris Internals page 903 says there is no tick processing for operating
system kernel threads, so all the user/system time values are not
incremented (see below).



>
> A problem exists .... and i belive its related somehow to this ubnormality
> ....
> this is the only machine which shows the cpu time for the sched process
> higher then a few seconds. (27 typically)
> Sched process cpu time in ps -ef | grep sched is always equal to the uptime
> of the virtual environment ..... which makes no sense ...
> What i mean is once i reboot the OS the sched process CPU time cant be
> higher then the OS uptime !!
>

Please send the results for the following commands.

ps -ef | grep sched
uptime

# mdb -k
> ::ps !grep sched
R      0      0      0      0      0 0x00000001 fec21dd8 sched
> fec21dd8::print proc_t p_stime
p_stime = 0x7e
> fec21dd8::print proc_t p_utime
p_utime = 0
> fec21dd8::print proc_t p_acct
p_acct = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]

If you print this information over and over for "sched", it will not change.
If you print it for another user process, you'll see p_stime incrementing.



> The core problem is that this node is part of a 4 node cluster which has
> been happily running reliably for almost a year ...
> suddenly this one node panics almost every 3 hours ... the panic reason is
> a pm_tick delay EQUAL to the virtual machine uptime
> ..... so both the sched process cpu TIME and the pm_tick delay are equal
> ..... thus im assuming somehow related .... to a common root
> problem/anomaly.
>

Probably because they approach zero. Can you send the whole panic message ?


>
> Anything on how or what solaris is reading to end up calculating the cpu
> time ONLY for the sched process to be identical to the Virtual Machine
> (ESX3.5)
> uptime rather then the OS uptime ?
>


[1] -
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/procfs.h
[2] -
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/proc/prsubr.c
[3] -
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/proc.h

-- 
Giovanni Tirloni
sysdroid.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20100306/abbee13a/attachment.html>

Reply via email to