I’ve also been exploring the mmhealth and gpfsgui for the first time this week.
I have a test cluster where I’m trying the new stuff. Running 4.2.2-2
mmhealth cluster show says everyone is in nominal status:
Component Total Failed Degraded Healthy
Other
-------------------------------------------------------------------------------------
NODE 12 0 0 12
0
GPFS 12 0 0 12
0
NETWORK 12 0 0 12
0
FILESYSTEM 0 0 0 0
0
DISK 0 0 0 0
0
GUI 1 0 0 1
0
PERFMON 12 0 0 12
0
However on the GUI there is conflicting information:
1) Home page shows 3/8 NSD Servers unhealthy
2) Home page shows 3/21 Nodes unhealthy
— where is it getting this notion?
— there are only 12 nodes in the whole cluster!
3) clicking on either NSD Servers or Nodes leads to the monitoring page
where the top half spins forever, bottom half is content-free.
I may have installed the pmsensors RPM on a couple of other nodes back in early
April,
but have forgotten which ones. They are in the production cluster.
Also, the storage in this sandbox cluster has not been turned into a filesystem
yet.
There are a few dozen free NSDs. Perhaps the “FILESYSTEM CHECKING” status is
somehow
wedging up the GUI?
Node name: storage005.oscar.ccv.brown.edu
Node status: HEALTHY
Status Change: 15 hours ago
Component Status Status Change Reasons
------------------------------------------------------
GPFS HEALTHY 16 hours ago -
NETWORK HEALTHY 16 hours ago -
FILESYSTEM CHECKING 16 hours ago -
GUI HEALTHY 15 hours ago -
PERFMON HEALTHY 16 hours ago
I’ve tried restarting the GUI service and also rebooted the GUI server, but it
comes back looking the same.
Any thoughts?
> On May 11, 2017, at 7:28 AM, Anna Christina Wagner <[email protected]>
> wrote:
>
> Hello Bob,
>
> 4.2.2 is the release were we introduced "mmhealth cluster show". And you are
> totally right, it can be a little fragile at times.
>
> So a short explanation:
> We had this situation on test machines as well. Because of issues with the
> system not only the mm-commands but also usual Linux commands
> took more than 10 seconds to return. We have internally a default time out of
> 10 seconds for cli commands. So if you had a failover situation, in which the
> cluster manager
> was changed (we have our cluster state manager (CSM) on the cluster manager)
> and the mmlsmgr command did not return in 10 seconds the node does not
> know, that it is the CSM and will not start the corresponding service for
> that.
>
>
> If you want me to look further into it or if you have feedback regarding
> mmhealth please feel free to send me an email ([email protected])
>
> Mit freundlichen Grüßen / Kind regards
>
> Wagner, Anna Christina
>
> Software Engineer, Spectrum Scale Development
> IBM Systems
>
> IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats:
> Martina Koederitz
> Geschäftsführung: Dirk Wittkopp
> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart,
> HRB 243294
>
>
>
> From: "Oesterlin, Robert" <[email protected]>
> To: gpfsug main discussion list <[email protected]>
> Date: 10.05.2017 18:21
> Subject: Re: [gpfsug-discuss] "mmhealth cluster show" returns error
> Sent by: [email protected]
>
>
>
> Yea, it’s fine.
>
> I did manage to get it to respond after I did a “mmsysmoncontrol restart” but
> it’s still not showing proper status across the cluster.
>
> Seems a bit fragile :-)
>
> Bob Oesterlin
> Sr Principal Storage Engineer, Nuance
>
>
>
> On 5/10/17, 10:46 AM, "[email protected] on behalf of
> [email protected]" <[email protected] on behalf
> of [email protected]> wrote:
>
> On Wed, 10 May 2017 14:13:56 -0000, "Oesterlin, Robert" said:
>
> > [root]# mmhealth cluster show
> > nrg1-gpfs16.nrg1.us.grid.nuance.com: Could not find the cluster state
> manager. It may be in an failover process. Please try again in a few seconds.
>
> Does 'mmlsmgr' return something sane?
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss