Hello All,

I would like to discuss or understand on which ethernet networking 
switches/architecture seems to work best with GPFS. 
We had thought about infiniband, but are not yet ready to move to infiniband 
because of the complexity/upgrade and debugging issues that come with it. 

Current hardware:

We are currently using Arista 7328x 100G core switch for networking among the 
GPFS clusters and the compute nodes.

It is heterogeneous network, with some of the servers on 10G/25G/100G with LACP 
and without LACP.

For example: 

GPFS storage clusters either have 25G LACP, or 10G LACP, or a single 100G 
network port.
Compute nodes range from 10G to 100G.
Login nodes/transfer servers etc have 25G bonded.

Most of the servers have Mellanox ConnectX-4 or ConnectX-5 adapters. But we 
also have few older Intel,Broadcom and Chelsio network cards in the clusters.

Most of the transceivers that we use are Mellanox,Finisar,Intel.

Issue:

We had upgraded to the above switch recently, and we had seen that it is not 
able to handle the network traffic because of higher NSD servers bandwidth vs 
lower compute node bandwidth.

One issue that we did see was a lot of network discards on the switch side and 
network congestion with slow IO performance on respective compute nodes.

Once we enabled ECN - we did see that it had reduced the network congestion.

We do see expels once in a while, but that is mostly related to the network 
errors or the host not responding. We observed that bonding/LACP does make 
expels much more trickier, so we have decided to go with no LACP until GPFS 
code gets better at handling LACP - which I think they are working on.

We have heard that our current switch is a shallow buffer switch, and we would 
need a higher/deep buffer Arista switch to perform better with no 
congestion/lesser latency and more throughput.

On the other side, Mellanox promises to use better ASIC design and buffer 
architecture with spine leaf design, instead of one deep buffer core switch to 
get better performance than Arista.

Most of the applications that run on the clusters are either genomic 
applications on CPUs and deep learning applications on GPUs. 

All of our GPFS storage cluster versions are above 5.0.2 with the compute 
filesystems at 16M block size on near line rotating disks, and Flash storage at 
512K block size.


May I know if could feedback from anyone who is using Arista or Mellanox 
switches on the clusters to understand the pros and cons, stability and the 
performance numbers of the same?


Thank you,
Lohit
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to