Hi We are
Somethings is odd We had pretty good results. Latency halved which had a great improvement on throughput We use two separated fabrics. Configuration was not straight forward as this case is not covered by essgennetworks but worth it In your case you are down to one fabric as clients have one 25G port ________________________________ From: gpfsug-discuss <[email protected]> on behalf of Luke Sudbery <[email protected]> Sent: Friday, January 23, 2026 5:38:43 PM To: gpfsug main discussion list <[email protected]> Subject: [EXTERNAL] [gpfsug-discuss] Anyone using RoCE? Is anyone using RoCE with good results? We are planning on it, but initial tests are not great – we get much better performance using plain Ethernet over the exact same links. It’s up and working, I can see RDMA connections and counters, no errors, but performance is unstable. And worse than Ethernet, which was just meant to be a sanity check! Things I’ve looked at based on Lenovo and IBM guides, which I think are all configured correctly: * RoCE interfaces all on the same subnet * They all have IPv6 enabled with addresses using eui64 addr-gen-mode * DSCP trust mode on NICs * PFC flow control on NICs * Global Pause disabled on NICs * ToS configured for RDMA_CM * Source based routing for multiple interfaces on the same subnet. * Switches (nvidia cumulus) all enabled for RoCE QOS Iperf and GPFS over plain Ethernet get nearly 3GB/s, which is near the line speed of the NIC in question – 25Gbps. Testing basic RDMA connections with ib_send_bw gets about the same. But GPFS over RoCE gets from 0.7GB/s to 1.9GB/s. The servers have 4x 200G Mellanox cards. The client has 1x 25G card. What’s frustrating and confusing is that we get better performance when we just enable 1 card at the server end, and also get better performance if we have 1 fabric ID per NIC on the server (with all 4 fabric ID on the same NIC at the client end). I can go into more details if anyone has experience! Does this sound familiar to anyone? I am planning to open a call with Lenovo and/or IBM as I’m not quite sure where to look next. Cheers, Luke -- Luke Sudbery Principal Engineer (HPC and Storage). Architecture, Infrastructure and Systems Advanced Research Computing, IT Services Room 132, Computer Centre G5, Elms Road Please note I don’t work on Monday. ################################################################################### The information contained in this communication is confidential, may be subject to legal privilege, and is intended only for the individual named. If you are not the named addressee, please notify the sender immediately and delete this email from your system. The views expressed in this email are the views of the sender only. Outgoing and incoming electronic communications to this address are electronically archived and subject to review and/or disclosure to someone other than the recipient. ###################################################################################
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
