Morning (Hi Gilad)

We run RoCE over Mellanox 100G Ethernet and get 1.3us latency for the
shortest hop.  Increasing slightly as you go through the fabric.

We run ethernet for a full dual-plane fat-tree :)  It is 100% possible with
Mellanox :)

We love it.


On Fri, Jan 15, 2021 at 8:40 PM Jörg Saßmannshausen <
sassy-w...@sassy.formativ.net> wrote:

> Hi Gilad,
>
> thanks for the feedback, much appreciated.
> In an ideal world, you are right of course. OpenStack is supported
> natively on
> InfiniBand, and you can get the MetroX system to connect between two
> different
> sites (I leave it open of how to read that) etc.
>
> However, in the real world all of that needs to fit into a budget. From
> what I
> can see on the cluster, most jobs are in the region between 64 and 128
> cores.
> So, that raises the question for that rather small amount of cores, do we
> really need InfiniBand or can we do what we need to do with RoCE v2?
>
> In other words, for the same budget, does it make sense to remove the
> InfiniBand part of the design and get say one GPU box in instead?
>
> What I want to avoid is to make the wrong decision (cheap and cheerful)
> and
> ending up with a badly designed cluster later.
>
> As you mentioned MetroX: remind me please, what kind of cable does it
> need? Is
> that something special or can we use already existing cables, whatever is
> used
> between data centre sites (sic!)?
>
> We had a chat with Darren about that which was, as always talking to your
> colleague Darren, very helpful. I remember very distinct there was a
> reason
> why we went for the InfiniBand/RoCE solution but I cannot really remember
> it.
> It was something with the GPU boxes we want to buy as well.
>
> I will pass your comments on to my colleague next week when I am back at
> work
> and see what they say. So many thanks for your sentiments here which are
> much
> appreciated from me!
>
> All the best from a cold London
>
> Jörg
>
> Am Donnerstag, 26. November 2020, 12:51:55 GMT schrieb Gilad Shainer:
> > Let me try to help:
> >
> > -          OpenStack is supported natively on InfiniBand already,
> therefore
> > there is no need to go to Ethernet for that
>
> > -          File system wise, you can have IB file system, and connect
> > directly to IB system.
>
> > -          Depends on the distance, you can run 2Km IB between switches,
> or
> > use Mellanox MetroX for connecting over 40Km. VicinityIO have system that
> > go over thousands of miles…
>
> > -          IB advantages are with much lower latency (switches alone are
> 3X
> > lower latency), cost effectiveness (for the same speed, IB switches are
> > more cost effective than Ethernet) and the In-Network Computing engines
> > (MPI reduction operations, Tag Matching run on the network)
>
> > If you need help, feel free to contact directly.
> >
> > Regards,
> > Gilad Shainer
> >
> > From: Beowulf [mailto:beowulf-boun...@beowulf.org] On Behalf Of John
> Hearns
> > Sent: Thursday, November 26, 2020 3:42 AM
> > To: Jörg Saßmannshausen <sassy-w...@sassy.formativ.net>; Beowulf Mailing
> > List <beowulf@beowulf.org>
>  Subject: Re: [Beowulf] RoCE vs. InfiniBand
> >
> > External email: Use caution opening links or attachments
> >
> > Jorg, I think I might know where the Lustre storage is !
> > It is possible to install storage routers, so you could route between
> > ethernet and infiniband.
>  It is also worth saying that Mellanox have Metro
> > Infiniband switches - though I do not think they go as far as the west of
> > London!
> > Seriously though , you ask about RoCE. I will stick my neck out and say
> yes,
> > if you are planning an Openstack cluster
>  with the intention of having
> > mixed AI and 'traditional' HPC workloads I would go for a RoCE style
> setup.
> > In fact I am on a discussion about a new project for a customer with
> > similar aims in an hours time.
> > I could get some benchmarking time if you want to do a direct comparison
> of
> > Gromacs on IB / RoCE
>
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, 26 Nov 2020 at 11:14, Jörg Saßmannshausen
> > <sassy-w...@sassy.formativ.net<mailto:sassy-w...@sassy.formativ.net>>
> > wrote:
>  Dear all,
> >
> > as the DNS problems have been solve (many thanks for doing this!), I was
> > wondering if people on the list have some experiences with this question:
> >
> > We are currently in the process to purchase a new cluster and we want to
> > use
>  OpenStack for the whole management of the cluster. Part of the cluster
> > will run HPC applications like GROMACS for example, other parts typical
> > OpenStack applications like VM. We also are implementing a Data Safe
> Haven
> > for the more sensitive data we are aiming to process. Of course, we want
> to
> > have a decent size GPU partition as well!
> >
> > Now, traditionally I would say that we are going for InfiniBand. However,
> > for
>  reasons I don't want to go into right now, our existing file storage
> > (Lustre) will be in a different location. Thus, we decided to go for RoCE
> > for the file storage and InfiniBand for the HPC applications.
> >
> > The point I am struggling is to understand if this is really the best of
> > the
>  solution or given that we are not building a 100k node cluster, we
> > could use RoCE for the few nodes which are doing parallel, read MPI, jobs
> > too. I have a nagging feeling that I am missing something if we are
> moving
> > to pure RoCE and ditch the InfiniBand. We got a mixed workload, from
> ML/AI
> > to MPI applications like GROMACS to pipelines like they are used in the
> > bioinformatic corner. We are not planning to partition the GPUs, the
> > current design model is to have only 2 GPUs in a chassis.
> > So, is there something I am missing or is the stomach feeling I have
> really
> > a
>  lust for some sushi? :-)
> >
> > Thanks for your sentiments here, much welcome!
> >
> > All the best from a dull London
> >
> > Jörg
> >
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf@beowulf.org<mailto:Beowulf@beowulf.org>
> > sponsored by Penguin Computing
>  To change your subscription (digest mode or
> > unsubscribe) visit
> > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf<
> https://nam11.safelink
> >
> s.protection.outlook.com/?url=https%3A%2F%2Fbeowulf.org%2Fcgi-bin%2Fmailman%
> > 2Flistinfo%2Fbeowulf&data=04%7C01%7CShainer%40nvidia.com
> %7C8e220b6be2fa48921
> >
> dce08d892005b27%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637419877513157
> >
> 960%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1h
> >
> aWwiLCJXVCI6Mn0%3D%7C1000&sdata=0NLRDQHkYol82mmqs%2BQrFryEuitIpDss2NwgIeyg1K
> > 8%3D&reserved=0>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>


-- 
Dr Stuart Midgley
sdm...@gmail.com
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to