On 22/08/2023 10:51, Kidger, Daniel wrote:

Jonathan,

Thank you for the great answer!
Just to be clear though - are you talking about TCP/IP mounting of the 
filesystem(s) rather than RDMA ?


Yes for a few reasons. Firstly a bunch of our Ethernet adaptors don't support RDMA. Second there a lot of ducks to be got in line and kept in line for RDMA to work and that's too much effort IMHO. Thirdly the nodes can peg the 10Gbps interface they have which is a hard QOS that we are happy with. Though if specifying today we would have 25Gbps to the compute nodes and 100 possibly 200Gbps on the DSS-G nodes. Basically we don't want one node to go nuts and monopolize the file system :-) The DSS-G nodes don't have an issue keeping up so I am not sure there is much performance benefit from RDMA to be had.

That said you are supposed to be able to do IPoIB over the RDMA hardware's network, and I had presumed that the same could be said of TCP/IP over RDMA on Ethernet.

I think routing of RDMA is perhaps something only Lustre can do?


Possibly, something else is that we have our DSS-G nodes doing MLAG's over a pair of switches. I need to be able to do firmware updates on the network switches the DSS-G nodes are connected to without shutting down the cluster. I don't think you can do that with RDMA reading the switch manuals so another reason not to do it IMHO. In the 2020's the mantra is patch baby patch and everything is focused on making that quick and easy to achieve. Your expensive HPC system is for jack if hackers have taken it over because you didn't path it in a timely fashion. Also I would have a *lot* of explaining to do which I would rather not.

Also in our experience storage is rarely the bottle neck and when it is aka Gromacs is creating a ~1TB temp file at 10Gbps (yeah that's a real thing we have observed on a fairly regular basis) that's an intended QOS so everyone else can get work done and I don't get a bunch of tickets from users complaining about the file system performing badly. We have seen enough simultaneous Gromacs that without the 10Gbps hard QOS the filesystem would have been brought to it's knees.

We can't do the temp files locally on the node because we only spec'ed them with 1TB local disks and the Gromacs temp files regularly exceed the available local space. Also getting users to do it would be a nightmare :-)


JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

Reply via email to