Hello Daniel, I've solved my problem disabling the check (I've gpfs v4.2.3-5) by putting
ib_rdma_enable_monitoring=False in the [network] section of the file /var/mmfs/mmsysmon/mmsysmonitor.conf, and restarting the mmsysmonitor. There was a thread in this group about this problem. A ________________________________ From: [email protected] [[email protected]] on behalf of Yaron Daniel [[email protected]] Sent: Sunday, July 01, 2018 7:17 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] How to get rid of very old mmhealth events Hi There is was issue with Scale 5.x GUI error - ib_rdma_nic_unrecognized(mlx5_0/2) Check if you have the patch: [root@gssio1 ~]# diff /usr/lpp/mmfs/lib/mmsysmon/NetworkService.py /tmp/NetworkService.py 229c229,230 < recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: (\w+/\d+)/\d+\n", mmfsadm)) --- > #recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: > (\w+/\d+)/\d+\n", mmfsadm)) > recognizedNICs = set(re.findall(r"verbsConnectPorts\[\d+\] +: > (\w+/\d+)/\d+/\d+\n", mmfsadm)) And restart the - mmsysmoncontrol restart Regards ________________________________ Yaron Daniel 94 Em Ha'Moshavot Rd [cid:_1_0B5B5F080B5B5954005EFD8BC22582BD] Storage Architect – IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: [email protected] IBM Israel<http://www.ibm.com/il/he/> [IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:_1_06EDAF6406EDA744005EFD8BC22582BD][cid:_1_06EDB16C06EDA744005EFD8BC22582BD] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] [Related image] From: "Andrew Beattie" <[email protected]> To: [email protected] Date: 06/28/2018 11:16 AM Subject: Re: [gpfsug-discuss] How to get rid of very old mmhealth events Sent by: [email protected] ________________________________ Do you know if there is actually a cable plugged into port 2? The system will work fine as long as there is network connectivity, but you may have an issue with redundancy or loss of bandwidth if you do not have every port cabled and configured correctly. Regards Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: [email protected]<mailto:[email protected]> ----- Original message ----- From: "Dorigo Alvise (PSI)" <[email protected]> Sent by: [email protected] To: "[email protected]" <[email protected]> Cc: Subject: [gpfsug-discuss] How to get rid of very old mmhealth events Date: Thu, Jun 28, 2018 6:08 PM Dear experts, I've e GL2 IBM system running SpectrumScale v4.2.3-6 (RHEL 7.3). The system is working properly but I get a DEGRADED status report for the NETWORK running the command mmhealth: [root@sf-gssio1 ~]# mmhealth node show Node name: sf-gssio1.psi.ch Node status: DEGRADED Status Change: 23 min. ago Component Status Status Change Reasons ------------------------------------------------------------------------------------------------------------------------------------------- GPFS HEALTHY 22 min. ago - NETWORK DEGRADED 145 days ago ib_rdma_link_down(mlx5_0/2), ib_rdma_nic_down(mlx5_0/2), ib_rdma_nic_unrecognized(mlx5_0/2) [...] This event is clearly an outlier because the network, verbs and IB are correctly working: [root@sf-gssio1 ~]# mmfsadm test verbs status VERBS RDMA status: started [root@sf-gssio1 ~]# mmlsconfig verbsPorts|grep gssio1 verbsPorts mlx5_0/1 [sf-ems1,sf-gssio1,sf-gssio2] [root@sf-gssio1 ~]# mmdiag --config|grep verbsPorts ! verbsPorts mlx5_0/1 [root@sf-gssio1 ~]# ibstat mlx5_0 CA 'mlx5_0' CA type: MT4113 Number of ports: 2 Firmware version: 10.16.1020 Hardware version: 0 Node GUID: 0xec0d9a03002b5db0 System image GUID: 0xec0d9a03002b5db0 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 42 LMC: 0 SM lid: 1 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db0 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 10 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x26516848 Port GUID: 0xec0d9a03002b5db8 Link layer: InfiniBand That event is there since 145 days and I didn't go away after a daemon restart (mmshutdown/mmstartup). My question is: how I can get rid of this event and restore the mmhealth's output to HEALTHY ? This is important because I've nagios sensors that periodically parse the "mmhealth -Y ..." output and at the moment I've to disable their email notification (which is not good if some real bad event happens). Thanks, Alvise _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
