Hi Rupert,
The firmware of the hca's is updated to the latest stable version. We are still seeing the same issue. Updating the ofed library will be more difficult. Do you really think this will be the reason? In Ethernet it is common sense to make the tcp buffers larger for high-throughput or long latency networks. Is there something similar in Infiniband? Best regards, Koen Segers Enterprise Consultant Computacenter Services & Solutions Ikaroslaan 31 B-1930 Zaventem Belgium Tel: +32 2 704 94 67 Fax: +32 2 704 95 95 Mob: +32 497 909353 [email protected] <mailto:[email protected]> www.computacenter.com/benelux <http://www.computacenter.com/benelux> From: Rupert Dance <[email protected]> [mailto:Rupert Dance <[email protected]>] Sent: 17 October 2011 16:26 To: <[email protected]>; <[email protected]> Subject: RE: [ewg] 200m cable results in slower rdma read performance? [ CC Anti-Virus checked ] Koen, Can you try running ibdiagnet -P all=1 -ls 10 -lw 4x This will tell us if any links are not running at Link speed of 10 (QDR) and Link Width of 4x. You may also want to suggest an upgrade of OFED to 1.5.3.2 GA. There have been major improvements in the stack since 1.4.2. Also please be sure that you update the firmware in all hardware for the same reason. Thanks Rupert From: [email protected] [mailto:[email protected]] Sent: Monday, October 17, 2011 9:22 AM To: [email protected]; [email protected] Cc: [email protected] Subject: RE: [ewg] 200m cable results in slower rdma read performance? [ CC Anti-Virus checked ] Rupert, Thanks for replying. Below is the output of the ibdiagnet command. I don't see any issues here. Just tell me if you need more info. I forgot to mention that we are using the following switch version: edgeprod1# version show version: 3.6.0 date: Jun 07 2011 11:19:33 AM build Id:857 And the default SLES 11 SP1 ofed build: ofed-1.4.2-0.9.6 Best regards, 15:00:28|root@gpfsprod1n1:~ 0 # ibdiagnet Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2 -W- Topology file is not specified. Reports regarding cluster links will use direct routes. Loading IBDM from: /usr/lib64/ibdm1.2 -W- A few ports of local device are up. Since port-num was not specified (-p option), port 1 of device 1 will be used as the local port. -I- Discovering ... 39 nodes (6 Switches & 33 CA-s) discovered. -I--------------------------------------------------- -I- Bad Guids/LIDs Info -I--------------------------------------------------- -I- No bad Guids were found -I--------------------------------------------------- -I- Links With Logical State = INIT -I--------------------------------------------------- -I- No bad Links (with logical state = INIT) were found -I--------------------------------------------------- -I- PM Counters Info -I--------------------------------------------------- -I- No illegal PM counters values were found -I--------------------------------------------------- -I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list) -I--------------------------------------------------- -I- PKey:0x7fff Hosts:65 full:65 partial:0 -I--------------------------------------------------- -I- IPoIB Subnets Check -I--------------------------------------------------- -I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps SL:0x00 -W- Suboptimal rate for group. Lowest member rate:40Gbps > group-rate:10Gbps -I--------------------------------------------------- -I- Bad Links Info -I- No bad link were found -I--------------------------------------------------- ---------------------------------------------------------------- -I- Stages Status Report: STAGE Errors Warnings Bad GUIDs/LIDs Check 0 0 Link State Active Check 0 0 Performance Counters Report 0 0 Partitions Check 0 0 IPoIB Subnets Check 0 1 Please see /tmp/ibdiagnet.log for complete log ---------------------------------------------------------------- -I- Done. Run time was 5 seconds. This type of info is given in ibdiagnet.lst: { SW Ports:24 SystemGUID:0008f10500108fa9 NodeGUID:0008f10500108fa8 PortGUID:0008f10500108fa8 VenID:000008F1 DevID:5A5A0000 Rev:000000A1 {Voltaire 4036 # edgeprod3} LID:0001 PN:05 } { CA Ports:02 SystemGUID:0002c903004ab175 NodeGUID:0002c903004ab172 PortGUID:0002c903004ab173 VenID:000002C9 D evID:673C0000 Rev:000000B0 { HCA-1} LID:001A PN:01 } PHY=4x LOG=ACT SPD=10 Koen Segers Enterprise Consultant Computacenter Services & Solutions Ikaroslaan 31 B-1930 Zaventem Belgium Tel: +32 2 704 94 67 Fax: +32 2 704 95 95 Mob: +32 497 909353 [email protected] <mailto:[email protected]> www.computacenter.com/benelux <http://www.computacenter.com/benelux> From: Rupert Dance <[email protected]> [mailto:Rupert Dance <[email protected]>] Sent: 17 October 2011 13:46 To: <[email protected]>; <[email protected]> Subject: RE: [ewg] 200m cable results in slower rdma read performance? [ CC Anti-Virus checked ] Hi, Have you run ibdiagnet to verify that your link width and speed is what you expect on all links? Thanks Rupert Dance Software Forge From: [email protected] [mailto:[email protected]] On Behalf Of [email protected] Sent: Monday, October 17, 2011 3:22 AM To: [email protected] Subject: [ewg] 200m cable results in slower rdma read performance? [ CC Anti-Virus checked ] Hi, In my test setup I have 3 servers of which 2 are residing in Datacenter1 and the other in Datacenter2. If I do a rdma test between datacenters, I get a much lower performance than if I would do the test between servers residing in the same datacenter. DC1: gpfsprod1n1, gpfsprod1n3 DC2: gpfsprod1n2 08:54:48|root@gpfsprod1n1:~ 0 # qperf -t 5 cic-gpfsprod1n2 rc_rdma_write_bw rc_rdma_write_bw: bw = 1.9 GB/sec 08:54:59|root@gpfsprod1n1:~ 0 # qperf -t 5 cic-gpfsprod1n3 rc_rdma_write_bw rc_rdma_write_bw: bw = 3.39 GB/sec The setup contains two pairs of edge switches (on each datacenter one pair) and two spine switches (each datacenter one), configured as a non blocking fat tree. So: The servers are connected to the edge switches. The spine switches are connected to all edge switches. These are the cables we are using: . Length 5m Vendor Name: WLGORE Code: QSFP+ Vendor PN: 498385-B24 Vendor Rev: D Vendor SN xxxx . Length 200m Vendor Name: MOLEX Code: QSFP+ Vendor PN: 106410-1200 Vendor Rev: A Vendor SN xxxx Can someone tell me why this is so? And maybe how I can solve this? Best regards, Koen Segers Enterprise Consultant Computacenter Services & Solutions Ikaroslaan 31 B-1930 Zaventem Belgium Tel: +32 2 704 94 67 Fax: +32 2 704 95 95 Mob: +32 497 909353 [email protected] <mailto:[email protected]> www.computacenter.com/benelux <http://www.computacenter.com/benelux> Visit us at http://www.computacenter.com/ Computacenter: Transforming IT service delivery. ========================== Disclaimer ================================== The information in this email is confidential, and is intended solely for the addressee(s). If you are not the intended recipient of this email please let us know by reply and then delete it from your system; you should not copy this message or disclose its contents to anyone. Due to the integrity risk of sending emails over the Internet, Computacenter will accept no liability for any comments and / or attachments contained within this email. ========================== Disclaimer ================================== Visit us at http://www.computacenter.com/ Computacenter: Transforming IT service delivery. ========================== Disclaimer ================================== The information in this email is confidential, and is intended solely for the addressee(s). If you are not the intended recipient of this email please let us know by reply and then delete it from your system; you should not copy this message or disclose its contents to anyone. Due to the integrity risk of sending emails over the Internet, Computacenter will accept no liability for any comments and / or attachments contained within this email. ========================== Disclaimer ================================== Visit us at http://www.computacenter.com/ Computacenter: Transforming IT service delivery. ========================== Disclaimer ================================== The information in this email is confidential, and is intended solely for the addressee(s). If you are not the intended recipient of this email please let us know by reply and then delete it from your system; you should not copy this message or disclose its contents to anyone. Due to the integrity risk of sending emails over the Internet, Computacenter will accept no liability for any comments and / or attachments contained within this email. ========================== Disclaimer ==================================
_______________________________________________ ewg mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
