Andrew,

I agree that the choice of hash function is important for LACP. My thinking has always been to stay down in layers 2 and 3.  With enough hosts it seems likely that traffic would be split close to evenly.  Heads or tails - 50% of the time you're right.  TCP ports should also be nearly equally split, but listening ports could introduce some asymmetry.

What I'm concerned about is the next level up:  With the client network and the cluster network (Marc's terms are more descriptive) on the same NICs/Switch Ports, with or without LACP and LAGs, it seems possible that at times the bandwidth consumed by cluster traffic could overwhelm and starve the client traffic. Or the other way around, which would be worse if the cluster nodes can't communicate on their 'private' network to keep the cluster consistent.  These overloads could  happen in the packet queues in the NIC drivers, or maybe in the switch fabric.

Maybe these starvation scenarios aren't that likely in clusters with 10GB networking.  Maybe it's hard to fill up a 10GB pipe, much less two.  But it could happen with 1GB NICs, even in LAGs of 4 or 6 ports, and eventually it will be possible with faster NVMe drives to easily fill a 10GB pipe.

So, what could we do with some of the 'exotic' queuing mechanisms available in Linux to keep the balance - to assure that the lesser category can transmit proportionally?  (And is 'proportional' the right answer, or should one side get a slight advantage?)

-Dave

Dave Hall
Binghamton University
kdh...@binghamton.edu
On 3/15/2021 12:48 PM, Andrew Walker-Brown wrote:

Dave

That’s the way our cluster is setup. It’s relatively small, 5 hosts, 12 osd’s.

Each host has 2x10G with LACP to the switches.  We’ve vlan’d public/private 
networks.

Making best use of the LACP lag will to a greater extent be down to choosing 
the best hashing policy.  At the moment we’re using layer3+4 on the Linux 
config and switch configs.  We’re monitoring link utilisation to make sure the 
balancing is as close to equal as possible.

Hope this helps

A

Sent from my iPhone

On 15 Mar 2021, at 16:39, Marc <m...@f1-outsourcing.eu> wrote:

I have client and cluster network on one 10gbit port (with different vlans).
I think many smaller clusters do this ;)

I've been thinking about ways to squeeze as much performance as possible
from the NICs  on a Ceph OSD node.  The nodes in our cluster (6 x OSD, 3
x MGR/MON/MDS/RGW) currently have 2 x 10GB ports.  Currently, one port
is assigned to the front-side network, and one to the back-side
network.  However, there are times when the traffic on one side or the
other is more intense and might benefit from a bit more bandwidth.

The idea I had was to bond the two ports together, and to run the
back-side network in a tagged VLAN on the combined 20GB LACP port.  In
order to keep the balance and prevent starvation from either side it
would be necessary to apply some sort of a weighted fair queuing
mechanism via the 'tc' command.  The idea is that if the client side
isn't using up the full 10GB/node, and there is a burst of re-balancing
activity, the bandwidth consumed by the back-side traffic could swell to
15GB or more.   Or vice versa.

 From what I have read and studied, these algorithms are fairly
responsive to changes in load and would thus adjust rapidly if the
demand from either side suddenly changed.

Maybe this is a crazy idea, or maybe it's really cool.  Your thoughts?

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdh...@binghamton.edu
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to