I have an X4500 running OpenSolaris 2008.11 (snv_101b) operating as an NFS 
v3/v4 server to about 80 clients.

Over the last few months, we've been seeing an issue where file locks start to 
take a very long time (60-90 seconds), and the number of running lockd threads 
rises up into the 50-100 range from its normal range of 3-5.

During these storms, the system seems otherwise healthy. File serving 
performance is speedy,  the load is below 2, we have lots of free RAM and swap, 
and we're not pushing more IOs that normal.

Sun support wants us to panic the box during one of these storms and collect a 
crash dump, but that has little chance of happening on our live production 
server, and I can't reproduce this on any other machines.

Has anyone seen anything similar to this, or have any tips on how to further 
debug this?  Maybe a dtrace script to see where it's spending so much time?
-- 
This message posted from opensolaris.org

Reply via email to