Hi Chris, that is basically what we are planning to do: using RoCE v2 for the Lnet routers and use InfiniBand for the HPC part of the cluster. As I have mentioned before, the problem is that the existing Lustre will be in a different location to where the new facility will be. At least that was the latest we got told. To make things a bit more interesting: currently we *are* using InfiniBand (Mellanox and the Intel one) to connect from the compute nodes to Lustre. So we have 2 problems: how to connect the new with the old world and what to do with the HPC workload we are having? This is where that mixed RoCE/ InfiniBand design came up.
I hope this, with what I wrote in the other replies, makes sense. All the best from London, still cold and dark. :-) Jörg Am Freitag, 27. November 2020, 07:13:46 GMT schrieb Chris Samuel: > On Thursday, 26 November 2020 3:14:05 AM PST Jörg Saßmannshausen wrote: > > Now, traditionally I would say that we are going for InfiniBand. However, > > for reasons I don't want to go into right now, our existing file storage > > (Lustre) will be in a different location. Thus, we decided to go for RoCE > > for the file storage and InfiniBand for the HPC applications. > > I think John hinted at this, but is there a reason for not going for IB for > the cluster and then using Lnet routers to connect out to the Lustre storage > via ethernet (with RoCE) ? > > https://wiki.lustre.org/LNet_Router_Config_Guide > > We use Lnet routers on our Cray system to bridge between the Aries > interconnect inside the XC to the IB fabric our Lustre storage sits on. > > All the best, > Chris _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf