Hi Andrew, thanks for the naswer. No, the port #2 (on all the nodes) is not cabled.
Regards, Alvise ________________________________ From: [email protected] [[email protected]] on behalf of Andrew Beattie [[email protected]] Sent: Thursday, June 28, 2018 10:15 AM To: [email protected] Subject: Re: [gpfsug-discuss] How to get rid of very old mmhealth events Do you know if there is actually a cable plugged into port 2? The system will work fine as long as there is network connectivity, but you may have an issue with redundancy or loss of bandwidth if you do not have every port cabled and configured correctly. Regards Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: [email protected]<mailto:[email protected]> ----- Original message ----- From: "Dorigo Alvise (PSI)" <[email protected]> Sent by: [email protected] To: "[email protected]" <[email protected]> Cc: Subject: [gpfsug-discuss] How to get rid of very old mmhealth events Date: Thu, Jun 28, 2018 6:08 PM Dear experts, I've e GL2 IBM system running SpectrumScale v4.2.3-6 (RHEL 7.3). The system is working properly but I get a DEGRADED status report for the NETWORK running the command mmhealth: [root@sf-gssio1 ~]# mmhealth node show Node name: sf-gssio1.psi.ch Node status: DEGRADED Status Change: 23 min. ago Component Status Status Change Reasons ------------------------------------------------------------------------------------------------------------------------------------------- GPFS HEALTHY 22 min. ago - NETWORK DEGRADED 145 days ago ib_rdma_link_down(mlx5_0/2), ib_rdma_nic_down(mlx5_0/2), ib_rdma_nic_unrecognized(mlx5_0/2) [...] This event is clearly an outlier because the network, verbs and IB are correctly working: [root@sf-gssio1 ~]# mmfsadm test verbs status VERBS RDMA status: started [root@sf-gssio1 ~]# mmlsconfig verbsPorts|grep gssio1 verbsPorts mlx5_0/1 [sf-ems1,sf-gssio1,sf-gssio2] [root@sf-gssio1 ~]# mmdiag --config|grep verbsPorts ! verbsPorts mlx5_0/1 [root@sf-gssio1 ~]# ibstat mlx5_0 CA 'mlx5_0' CA type: MT4113 Number of ports: 2 Firmware version: 10.16.1020 Hardware version: 0 Node GUID: 0xec0d9a03002b5db0 System image GUID: 0xec0d9a03002b5db0 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 42 LMC: 0 SM lid: 1 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db0 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 10 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db8 Link layer: InfiniBand That event is there since 145 days and I didn't go away after a daemon restart (mmshutdown/mmstartup). My question is: how I can get rid of this event and restore the mmhealth's output to HEALTHY ? This is important because I've nagios sensors that periodically parse the "mmhealth -Y ..." output and at the moment I've to disable their email notification (which is not good if some real bad event happens). Thanks, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
