Is anyone using RoCE with good results? We are planning on it, but initial 
tests are not great - we get much better performance using plain Ethernet over 
the exact same links.

It's up and working, I can see RDMA connections and counters, no errors, but 
performance is unstable. And worse than Ethernet, which was just meant to be a 
sanity check!

Things I've looked at based on Lenovo and IBM guides, which I think are all 
configured correctly:

  *   RoCE interfaces all on the same subnet
  *   They all have IPv6 enabled with  addresses using eui64 addr-gen-mode
  *   DSCP trust mode on NICs
  *   PFC flow control on NICs
  *   Global Pause disabled on NICs
  *   ToS configured for RDMA_CM
  *   Source based routing for multiple interfaces on the same subnet.
  *   Switches (nvidia cumulus) all enabled for RoCE QOS

Iperf and GPFS over plain Ethernet get nearly 3GB/s, which is near the line 
speed of the NIC in question - 25Gbps. Testing basic RDMA connections with 
ib_send_bw gets about the same. But GPFS over RoCE gets from 0.7GB/s to 1.9GB/s.

The servers have 4x 200G Mellanox cards. The client has 1x 25G card. What's 
frustrating and confusing is that we get better performance when we just enable 
1 card at the server end, and also get better performance if we have 1 fabric 
ID per NIC on the server (with all 4 fabric ID on the same NIC at the client 
end).

I can go into more details if anyone has experience! Does this sound familiar 
to anyone? I am planning to open a call with Lenovo and/or IBM as I'm not quite 
sure where to look next.

Cheers,

Luke

--
Luke Sudbery
Principal Engineer (HPC and Storage).
Architecture, Infrastructure and Systems
Advanced Research Computing, IT Services
Room 132, Computer Centre G5, Elms Road

Please note I don't work on Monday.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

Reply via email to