[gpfsug-discuss] mmhealth - where is the info hiding?

valdis . kletnieks Thu, 19 Jul 2018 15:15:44 -0700

So I'm trying to tidy up things like 'mmhealth' etc.  Got most of it fixed, but 
stuck on
one thing..


Note: I already did a 'mmhealth node eventlog --clear -N all' yesterday, which
cleaned out a bunch of other long-past events that were "stuck" as failed /
degraded even though they were corrected days/weeks ago - keep this in mind as
you read on....

# mmhealth cluster show

Component           Total         Failed       Degraded        Healthy          
Other
-------------------------------------------------------------------------------------
NODE                   10              0              0             10          
    0
GPFS                   10              0              0             10          
    0
NETWORK                10              0              0             10          
    0
FILESYSTEM              1              0              1              0          
    0
DISK                  102              0              0            102          
    0
CES                     4              0              0              4          
    0
GUI                     1              0              0              1          
    0
PERFMON                10              0              0             10          
    0
THRESHOLD              10              0              0             10          
    0

Great.  One hit for 'degraded' filesystem.

# mmhealth node show --unhealthy -N all
(skipping all the nodes that show healthy)

Node name:      arnsd3-vtc.nis.internal
Node status:    HEALTHY
Status Change:  21 hours ago

Component      Status        Status Change     Reasons
-----------------------------------------------------------------------------------
FILESYSTEM     FAILED        24 days ago       
pool-data_high_error(archive/system)
(...)
Node name:      arproto2-isb.nis.internal
Node status:    HEALTHY
Status Change:  21 hours ago

Component      Status        Status Change     Reasons
----------------------------------------------------------------------------------
FILESYSTEM     DEGRADED      6 days ago        
pool-data_high_warn(archive/system)

mmdf tells me:
nsd_isb_01        13103005696        1 No       Yes      1747905536 ( 13%)     
111667200 ( 1%)
nsd_isb_02        13103005696        1 No       Yes      1748245504 ( 13%)     
111724384 ( 1%)
(94 more LUNs all within 0.2% of these for usage - data is striped out pretty 
well)

There's also 6 SSD LUNs for metadata:
nsd_isb_flash_01    2956984320        1 Yes      No       2116091904 ( 72%)     
 26996992 ( 1%)
(again, evenly striped)

So who is remembering that status, and how to clear it?

pgp3cpXxrJEXP.pgp
Description: PGP signature

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] mmhealth - where is the info hiding?

Reply via email to