Hello All,
I would like to discuss or understand on which ethernet networking
switches/architecture seems to work best with GPFS.
We had thought about infiniband, but are not yet ready to move to infiniband
because of the complexity/upgrade and debugging issues that come with it.
Current hardware:
We are currently using Arista 7328x 100G core switch for networking among the
GPFS clusters and the compute nodes.
It is heterogeneous network, with some of the servers on 10G/25G/100G with LACP
and without LACP.
For example:
GPFS storage clusters either have 25G LACP, or 10G LACP, or a single 100G
network port.
Compute nodes range from 10G to 100G.
Login nodes/transfer servers etc have 25G bonded.
Most of the servers have Mellanox ConnectX-4 or ConnectX-5 adapters. But we
also have few older Intel,Broadcom and Chelsio network cards in the clusters.
Most of the transceivers that we use are Mellanox,Finisar,Intel.
Issue:
We had upgraded to the above switch recently, and we had seen that it is not
able to handle the network traffic because of higher NSD servers bandwidth vs
lower compute node bandwidth.
One issue that we did see was a lot of network discards on the switch side and
network congestion with slow IO performance on respective compute nodes.
Once we enabled ECN - we did see that it had reduced the network congestion.
We do see expels once in a while, but that is mostly related to the network
errors or the host not responding. We observed that bonding/LACP does make
expels much more trickier, so we have decided to go with no LACP until GPFS
code gets better at handling LACP - which I think they are working on.
We have heard that our current switch is a shallow buffer switch, and we would
need a higher/deep buffer Arista switch to perform better with no
congestion/lesser latency and more throughput.
On the other side, Mellanox promises to use better ASIC design and buffer
architecture with spine leaf design, instead of one deep buffer core switch to
get better performance than Arista.
Most of the applications that run on the clusters are either genomic
applications on CPUs and deep learning applications on GPUs.
All of our GPFS storage cluster versions are above 5.0.2 with the compute
filesystems at 16M block size on near line rotating disks, and Flash storage at
512K block size.
May I know if could feedback from anyone who is using Arista or Mellanox
switches on the clusters to understand the pros and cons, stability and the
performance numbers of the same?
Thank you,
Lohit
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss