That is interesting!!I'd have been inclined to: - crack off a dtrace fbt
profile run looking at what was on cpu most of the time, and what the most
common stacks were - grab a crash dump so you can poke around later... - check
out the cpu power states (using powertop or similar)If you were stuck in the
lowest c-state, all the things you described could wel happen, but I have never
seen thay happen... Id also be looking closely at something having clamped the
arc. I have seen arc_no_grow stuck as 1 in the past, and once in that state,
the arc shrinks to mearly nothing, causing much pain. mdb -k and then a ::arc
for some arc details and perhaps a ::arc_no_grow::print -tNothing much else
comes immediately to mind... Cheers!Nathan. Sent from my Samsung Galaxy
smartphone.
-------- Original message --------From: Andre van Eyssen <[email protected]>
Date: 30/3/20 10:43 am (GMT+10:00) To: [email protected] Cc: Melbourne
Solaris and Oracle Systems User Group <[email protected]> Subject:
[msosug] Interesting case Heya,Since we're having Fun Times with Solaris while
all locked-down, I'll add one that popped up over the weekend.Host is running
11.3/x86. Good performance, normally stable and running a near-idle CPU burn as
this is primarily a storage host with moderate demands (only writing about
2MB/sec average over time apart from bursts).In an absolute instant with no
warning, CPU usages starts running high. Here's a graph from Zabbix:
http://mexico.purplecow.org/p/data/images/7/7.pngSimple operations like an ssh
connection to the host are very sluggish. Normally low-burning processes are
now consuming a real % of CPU. For example, nmbd was burning 5% according to
prstat.The kernel process for running the main data zpool was now burning about
50% of the user time on the host and I/O was taking a long time to dequeue to
disk, leading to measurable performance changes on client systems.FMA is
reporting no activity and there is nothing in any logs, including the
non-default logfile that takes debug.* from syslog.DRAC reports system
temperature of 23 degrees and CPU temperature of 42 degrees, fluctuating only
mildly.Symptom-wise, for all intents and purposes it looked like the CPU had
throttled down to some insanely slow speed and was dragging everything through
the mud, now running up at 90% systime just servicing interrupts and
ZFS.Nothing interesting in lockstat. Nothing interesting in intrstat other than
the sheer % of CPU burned servicing mpt_sas and qlt.After an extended period of
trying to work out the root cause, the machine was dealt an init 5 (after zpool
offline on a number of clients, good thing they were mirrored across heads...)
and after a very long shutdown the machine was re-powered with all symptoms
gone.Ideas?CPU burn graph including pre-event and
post-reboot:http://mexico.purplecow.org/p/data/images/9/9.pngGraph legend:
blue = user time red = sys time green = idleAndre.--
Andre van Eyssen. Phone: +61 417 211 788mail:
[email protected] http://andre.purplecow.orgAbout & Contact:
http://www.purplecow.org/andre.html_______________________________________________msosug
mailing
[email protected]http://mexico.purplecow.org/m/listinfo/msosugDelivered
for: [email protected]
_______________________________________________
msosug mailing list
[email protected]
http://mexico.purplecow.org/m/listinfo/msosug
Delivered for: [email protected]