So I'm trying to tidy up things like 'mmhealth' etc. Got most of it fixed, but stuck on one thing..
Note: I already did a 'mmhealth node eventlog --clear -N all' yesterday, which
cleaned out a bunch of other long-past events that were "stuck" as failed /
degraded even though they were corrected days/weeks ago - keep this in mind as
you read on....
# mmhealth cluster show
Component Total Failed Degraded Healthy
Other
-------------------------------------------------------------------------------------
NODE 10 0 0 10
0
GPFS 10 0 0 10
0
NETWORK 10 0 0 10
0
FILESYSTEM 1 0 1 0
0
DISK 102 0 0 102
0
CES 4 0 0 4
0
GUI 1 0 0 1
0
PERFMON 10 0 0 10
0
THRESHOLD 10 0 0 10
0
Great. One hit for 'degraded' filesystem.
# mmhealth node show --unhealthy -N all
(skipping all the nodes that show healthy)
Node name: arnsd3-vtc.nis.internal
Node status: HEALTHY
Status Change: 21 hours ago
Component Status Status Change Reasons
-----------------------------------------------------------------------------------
FILESYSTEM FAILED 24 days ago
pool-data_high_error(archive/system)
(...)
Node name: arproto2-isb.nis.internal
Node status: HEALTHY
Status Change: 21 hours ago
Component Status Status Change Reasons
----------------------------------------------------------------------------------
FILESYSTEM DEGRADED 6 days ago
pool-data_high_warn(archive/system)
mmdf tells me:
nsd_isb_01 13103005696 1 No Yes 1747905536 ( 13%)
111667200 ( 1%)
nsd_isb_02 13103005696 1 No Yes 1748245504 ( 13%)
111724384 ( 1%)
(94 more LUNs all within 0.2% of these for usage - data is striped out pretty
well)
There's also 6 SSD LUNs for metadata:
nsd_isb_flash_01 2956984320 1 Yes No 2116091904 ( 72%)
26996992 ( 1%)
(again, evenly striped)
So who is remembering that status, and how to clear it?
pgp3cpXxrJEXP.pgp
Description: PGP signature
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
