On 09/20/2016 01:39 PM, Lewis Hyatt wrote:
Thanks very much for the suggestions. dmesg output is here:
We don't see any disk-related stuff there, and also our GUI shows all
the RAID arrays as being fine.
Hmmm .... I rarely trust GUIs for RAID. Do you have underlying CLI
tools you can do a sanity check with?
If anything in there jumps out at you, I'd really appreciate your
thoughts! We are almost certainly going to reboot the affected OSS later
today to see how that goes.
Not seeing anything leap out other than two particular targets,
twlstr-OST000b and twlstr-OST0006, appear to be "slow". This appears to
be what is causing client evictions, lock bits, etc.
The question is, why are these two OSTs slow. What is the underlying
RAID, how many operations are queued up, etc.?
A tool we recommend for (nearly instantaneous) holistic level views on a
system is glances, which you can install via pip
pip install glances
then run it as
glances -t 1
to get a second by second view of your system. Dstat is also good.
Dumb question ... what does
report? I am assuming you aren't swapping (and don't have swap enabled
on the system, but it never hurts to ask).
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
p: +1 734 786 8423 x121
c: +1 734 612 4615
lustre-discuss mailing list