Hi Yaron, Please see the outputs of mmlsconfig and ibstat below:
sudo /usr/lpp/mmfs/bin/mmlsconfig |grep -i verbs verbsRdmasPerNode 192 verbsRdma enable verbsRdmaSend yes verbsRdmasPerConnection 48 verbsRdmasPerConnection 16 verbsPorts mlx5_4/1/1 mlx5_5/1/2 verbsPorts mlx4_0/1/0 mlx4_0/2/0 verbsPorts mlx5_0/1/1 mlx5_1/1/2 verbsPorts mlx5_0/1/1 mlx5_2/1/2 verbsPorts mlx5_2/1/1 mlx5_3/1/2 ibstat output on NSD server: CA 'mlx5_0' CA type: MT4115 Number of ports: 1 Firmware version: 12.25.1020 Hardware version: 0 Node GUID: 0x506b4b03000fdb74 System image GUID: 0x506b4b03000fdb74 Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x526b4bfffe0fdb74 Link layer: Ethernet CA 'mlx5_1' CA type: MT4115 Number of ports: 1 Firmware version: 12.25.1020 Hardware version: 0 Node GUID: 0x506b4b03000fdb75 System image GUID: 0x506b4b03000fdb74 Port 1: State: Down Physical state: Disabled Rate: 40 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x526b4bfffe0fdb75 Link layer: Ethernet CA 'mlx5_2' CA type: MT4115 Number of ports: 1 Firmware version: 12.25.1020 Hardware version: 0 Node GUID: 0xec0d9a0300a7e928 System image GUID: 0xec0d9a0300a7e928 Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x526b4bfffe0fdb74 Link layer: Ethernet CA 'mlx5_3' CA type: MT4115 Number of ports: 1 Firmware version: 12.25.1020 Hardware version: 0 Node GUID: 0xec0d9a0300a7e929 System image GUID: 0xec0d9a0300a7e928 Port 1: State: Down Physical state: Disabled Rate: 40 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0xee0d9afffea7e929 Link layer: Ethernet CA 'mlx5_4' CA type: MT4115 Number of ports: 1 Firmware version: 12.25.1020 Hardware version: 0 Node GUID: 0xec0d9a0300da5f92 System image GUID: 0xec0d9a0300da5f92 Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 13 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0xec0d9a0300da5f92 Link layer: InfiniBand CA 'mlx5_5' CA type: MT4115 Number of ports: 1 Firmware version: 12.25.1020 Hardware version: 0 Node GUID: 0xec0d9a0300da5f93 System image GUID: 0xec0d9a0300da5f92 Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 6 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0xec0d9a0300da5f93 Link layer: InfiniBand ibstat output on CES server: CA 'mlx5_0' CA type: MT4115 Number of ports: 1 Firmware version: 12.22.4030 Hardware version: 0 Node GUID: 0xb88303ffff5ec6ec System image GUID: 0xb88303ffff5ec6ec Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 9 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0xb88303ffff5ec6ec Link layer: InfiniBand CA 'mlx5_1' CA type: MT4115 Number of ports: 1 Firmware version: 12.22.4030 Hardware version: 0 Node GUID: 0xb88303ffff5ec6ed System image GUID: 0xb88303ffff5ec6ec Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 12 LMC: 0 SM lid: 1 Capability mask: 0x2651e848 Port GUID: 0xb88303ffff5ec6ed Link layer: InfiniBand Prasad Surampudi|Sr. Systems Engineer|ATS Group, LLC<http://www.theatsgroup.com/> ________________________________ From: [email protected] <[email protected]> on behalf of [email protected] <[email protected]> Sent: Thursday, July 23, 2020 3:09 AM To: [email protected] <[email protected]> Subject: gpfsug-discuss Digest, Vol 102, Issue 12 Send gpfsug-discuss mailing list submissions to [email protected] To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to [email protected] You can reach the person managing the list at [email protected] When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Spectrum Scale pagepool size with RDMA (Prasad Surampudi) 2. Re: Spectrum Scale pagepool size with RDMA (Yaron Daniel) ---------------------------------------------------------------------- Message: 1 Date: Thu, 23 Jul 2020 00:34:02 +0000 From: Prasad Surampudi <[email protected]> To: "[email protected]" <[email protected]> Subject: [gpfsug-discuss] Spectrum Scale pagepool size with RDMA Message-ID: <mn2pr13mb2976e6b29cde6ec77b9b2b529e...@mn2pr13mb2976.namprd13.prod.outlook.com> Content-Type: text/plain; charset="iso-8859-1" Hi, We have an ESS clusters with two CES nodes. The pagepool is set to 128 GB ( Real Memory is 256 GB ) on both ESS NSD servers and CES nodes as well. Occasionally we see the mmfsd process memory usage reaches 90% on NSD servers and CES nodes and stays there until GPFS is recycled. I have couple of questions in this scenario: 1. What are the general recommendations of pagepool size for nodes with RDMA enabled? On, IBM knowledge center for RDMA tuning says "If the GPFS pagepool is set to 32 GB, then the mapping of the RDMA for this pagepool must be at least 64 GB." So, does this mean that the pagepool can't be more than half of real memory with RDMA enabled? Also, Is this the reason why mmfsd memory usage exceeds pagepool size and spikes to almost 90%? 2. If we dont want to see high mmfsd process memory usage on NSD/CES nodes, should we decrease the pagepool size? 3. Can we tune log_num_mtt parameter to limit the memory usage? Currently its set to 0 for both NSD (ppc64_le) and CES (x86_64). 4. We also see messages like "Verbs RDMA disabled for xx.xx.xx.xx due to no matching port found" . Any idea what this message indicate? I dont see any Verbs RDMA enabled message after these warning messages. Does it get enabled automatically? Prasad Surampudi|Sr. Systems Engineer|ATS Group, LLC<http://www.theatsgroup.com/> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/e2eee8fe/attachment-0001.html> ------------------------------ Message: 2 Date: Thu, 23 Jul 2020 10:09:17 +0300 From: "Yaron Daniel" <[email protected]> To: gpfsug main discussion list <[email protected]> Subject: Re: [gpfsug-discuss] Spectrum Scale pagepool size with RDMA Message-ID: <ofbf335ae3.531c9498-onc22585ae.00273afb-c22585ae.00274...@notes.na.collabserv.com> Content-Type: text/plain; charset="iso-8859-1" Hi What is the output for: #mmlsconfig |grep -i verbs #ibstat Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: [email protected] Webex: https://ibm.webex.com/meet/yard IBM Israel From: Prasad Surampudi <[email protected]> To: "[email protected]" <[email protected]> Date: 07/23/2020 03:34 AM Subject: [EXTERNAL] [gpfsug-discuss] Spectrum Scale pagepool size with RDMA Sent by: [email protected] Hi, We have an ESS clusters with two CES nodes. The pagepool is set to 128 GB ( Real Memory is 256 GB ) on both ESS NSD servers and CES nodes as well. Occasionally we see the mmfsd process memory usage reaches 90% on NSD servers and CES nodes and stays there until GPFS is recycled. I have couple of questions in this scenario: What are the general recommendations of pagepool size for nodes with RDMA enabled? On, IBM knowledge center for RDMA tuning says "If the GPFS pagepool is set to 32 GB, then the mapping of the RDMA for this pagepool must be at least 64 GB." So, does this mean that the pagepool can't be more than half of real memory with RDMA enabled? Also, Is this the reason why mmfsd memory usage exceeds pagepool size and spikes to almost 90%? If we dont want to see high mmfsd process memory usage on NSD/CES nodes, should we decrease the pagepool size? Can we tune log_num_mtt parameter to limit the memory usage? Currently its set to 0 for both NSD (ppc64_le) and CES (x86_64). We also see messages like "Verbs RDMA disabled for xx.xx.xx.xx due to no matching port found" . Any idea what this message indicate? I dont see any Verbs RDMA enabled message after these warning messages. Does it get enabled automatically? Prasad Surampudi|Sr. Systems Engineer|ATS Group, LLC _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=3V12EzdqYBk1P235cOvncsD-pOXNf5e5vPp85RnNhP8&s=XxlITEUK0nSjIyiu9XY1DEbYiVzVbp5XHcvQPfFJ2NY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1114 bytes Desc: not available URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment.gif> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4105 bytes Desc: not available URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0001.gif> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3847 bytes Desc: not available URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment.jpe> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4266 bytes Desc: not available URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0001.jpe> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3747 bytes Desc: not available URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0002.jpe> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3793 bytes Desc: not available URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0003.jpe> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4301 bytes Desc: not available URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0004.jpe> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3739 bytes Desc: not available URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0005.jpe> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3855 bytes Desc: not available URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0006.jpe> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 4084 bytes Desc: not available URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0002.gif> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3776 bytes Desc: not available URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200723/ddd59a6a/attachment-0007.jpe> ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 102, Issue 12 ***********************************************
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
