Hi

We are

Somethings is odd

We had pretty good results. Latency halved which had a great improvement on 
throughput

We use two separated fabrics. Configuration was not straight forward as this 
case is not covered by essgennetworks but worth it

In your case you are down to one fabric as clients have one 25G port



________________________________
From: gpfsug-discuss <[email protected]> on behalf of Luke 
Sudbery <[email protected]>
Sent: Friday, January 23, 2026 5:38:43 PM
To: gpfsug main discussion list <[email protected]>
Subject: [EXTERNAL] [gpfsug-discuss] Anyone using RoCE?


Is anyone using RoCE with good results? We are planning on it, but initial 
tests are not great – we get much better performance using plain Ethernet over 
the exact same links.



It’s up and working, I can see RDMA connections and counters, no errors, but 
performance is unstable. And worse than Ethernet, which was just meant to be a 
sanity check!



Things I’ve looked at based on Lenovo and IBM guides, which I think are all 
configured correctly:

  *   RoCE interfaces all on the same subnet
  *   They all have IPv6 enabled with  addresses using eui64 addr-gen-mode
  *   DSCP trust mode on NICs
  *   PFC flow control on NICs
  *   Global Pause disabled on NICs
  *   ToS configured for RDMA_CM
  *   Source based routing for multiple interfaces on the same subnet.
  *   Switches (nvidia cumulus) all enabled for RoCE QOS



Iperf and GPFS over plain Ethernet get nearly 3GB/s, which is near the line 
speed of the NIC in question – 25Gbps. Testing basic RDMA connections with 
ib_send_bw gets about the same. But GPFS over RoCE gets from 0.7GB/s to 1.9GB/s.



The servers have 4x 200G Mellanox cards. The client has 1x 25G card. What’s 
frustrating and confusing is that we get better performance when we just enable 
1 card at the server end, and also get better performance if we have 1 fabric 
ID per NIC on the server (with all 4 fabric ID on the same NIC at the client 
end).



I can go into more details if anyone has experience! Does this sound familiar 
to anyone? I am planning to open a call with Lenovo and/or IBM as I’m not quite 
sure where to look next.



Cheers,



Luke



--

Luke Sudbery

Principal Engineer (HPC and Storage).

Architecture, Infrastructure and Systems

Advanced Research Computing, IT Services

Room 132, Computer Centre G5, Elms Road



Please note I don’t work on Monday.




###################################################################################

The information contained in this communication is confidential, may be

subject to legal privilege, and is intended only for the individual named.

If you are not the named addressee, please notify the sender immediately and

delete this email from your system.  The views expressed in this email are

the views of the sender only.  Outgoing and incoming electronic communications

to this address are electronically archived and subject to review and/or 
disclosure

to someone other than the recipient.

###################################################################################
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

Reply via email to