Re: [ceph-users] Ceph networks, to bond or not to bond?

Sven Budde Fri, 06 Jun 2014 04:07:20 -0700

Hi all,

thanks for the replies and heads up for the different bonding options.I'll toy around with them in the next days; hopefully there's somestable setup possible with provides HA and increased bandwidth together.


Cheers,
Sven


Am 05.06.2014 21:36, schrieb Cedric Lemarchand:

Yes, forgot to mention that, of course LACP and stackable switches isthe safest and easy way, but sometimes when budget is a constraint youhave to deal with it. Prices difference between simple Gb switches andstackable ones are not negligible. You generally get what you paid for ;-)

But I think Linux bonding with a simple network design (2x1Gb for eachCeph networks) could do the trick well, with some works overhead.Maybe some cephers on this list could confirm that ?


Cheers


Le 05/06/2014 21:21, Scott Laird a écrit :

Doing bonding without LACP is probably going to end up being painful.Sooner or later you're going to end up with one end thinking thatbonding is working while the other end thinks that it's not, and halfof your traffic is going to get black-holed.

I've had moderately decent luck running Ceph on top of a weirdnetwork by carefully controlling the source address that everyoutbound connection uses and then telling Ceph that it's running witha 1-network config. With Linux, the default source address of anoutbound TCP connection is a function of the route that the kernelpicks to send traffic to the remote end, and you can override it on aper-route basis (it's visible as the the 'src' attribute in iproute).I have a mixed Infiniband+GigE network with each host running anOSPF routing daemon (for non-Ceph reasons, mostly), and the only twoways that I could get Ceph to be happy were:


1.  Turn off the Infiniband network.  Slow, and causes other problems.

2. Tell Ceph that there was no cluster network, and tell the OSPFdaemon to always set src=$eth0_ip on routes that it adds. Then justpretend that the Ethernet network is the only one that exists, andsometimes you get a sudden and unexpected boost in bandwidth due to/32 routes that send traffic via Infiniband instead of Ethernet.

It works, but I wouldn't recommend it for production. It would havebeen cheaper for me to buy a 10 GigE switch and cards for my garagethan to have debugged all of this, and that's just for a hobby project.


OTOH, it's probably the only way to get working multipathing for Ceph.

On Thu, Jun 5, 2014 at 10:50 AM, Cedric Lemarchand<[email protected] <mailto:[email protected]>> wrote:


    Le 05/06/2014 18:27, Sven Budde a écrit :
    > Hi Alexandre,
    >
    > thanks for the reply. As said, my switches are not stackable,
    so using LCAP seems not to be my best option.
    >
    > I'm seeking for an explanation how Ceph is utilizing two (or
    more) independent links on both the public and the cluster network.
    AFAIK, Ceph do not support multiple IP link in the same "designated
    network" (aka client/osd networks). Ceph is not aware of links
    aggregations, it has to be done at the Ethernet layer, so :

    - if your switchs are stackable, you can use traditional LACP on both
    sides (switch and Ceph)
    - if they are not, and as Mariusz said, use the appropriate
    bonding mode
    on the Ceph side and do not use LCAP on switchs.

    More infos here :
    http://www.linuxfoundation.org/collaborate/workgroups/networking/bonding

    Cheers !
    >
    > If I configure two IPs for the public network on two NICs, will
    Ceph route traffic from its (multiple) OSDs on this node over
    both IPs?
    >
    > Cheers,
    > Sven
    >
    > -----Ursprüngliche Nachricht-----
    > Von: Alexandre DERUMIER [mailto:[email protected]
    <mailto:[email protected]>]
    > Gesendet: Donnerstag, 5. Juni 2014 18:14
    > An: Sven Budde
    > Cc: [email protected] <mailto:[email protected]>
    > Betreff: Re: [ceph-users] Ceph networks, to bond or not to bond?
    >
    > Hi,
    >
    >>> My low-budget setup consists of two gigabit switches, capable
    of LACP,
    >>> but not stackable. For redundancy, I'd like to have my links
    spread
    >>> evenly over both switches.
    > If you want to do lacp with both switches, they need to be
    stackable.
    >
    > (or use active-backup bonding)
    >
    >>> My question where I didn't find a conclusive answer in the
    >>> documentation and mailing archives:
    >>> Will the OSDs utilize both 'single' interfaces per network, if I
    >>> assign two IPs per public and per cluster network? Or will
    all OSDs
    >>> just bind on one IP and use only a single link?
    > you just need 1 ip by bond.
    >
    > with lacp, the load balacing use an hash algorithm, to
    loadbalance tcp connections.
    > (that also mean than 1 connection can't use more than 1 link)
    >
    > check that your switch support ip+port hash algorithm,
    (xmit_hash_policy=layer3+4  is linux lacp bonding)
    >
    > like this, each osd->osd can be loadbalanced, same for your
    clients->osd.
    >
    >
    >
    >
    >
    >
    > ----- Mail original -----
    >
    > De: "Sven Budde" <[email protected]
    <mailto:[email protected]>>
    > À: [email protected] <mailto:[email protected]>
    > Envoyé: Jeudi 5 Juin 2014 16:20:04
    > Objet: [ceph-users] Ceph networks, to bond or not to bond?
    >
    > Hello all,
    >
    > I'm currently building a new small cluster with three nodes,
    each node having 4x 1 Gbit/s network interfaces available and
    8-10 OSDs running per node.
    >
    > I thought I assign 2x 1 Gb/s for the public network, and the
    other 2x 1 Gb/s for the cluster network.
    >
    > My low-budget setup consists of two gigabit switches, capable
    of LACP, but not stackable. For redundancy, I'd like to have my
    links spread evenly over both switches.
    >
    > My question where I didn't find a conclusive answer in the
    documentation and mailing archives:
    > Will the OSDs utilize both 'single' interfaces per network, if
    I assign two IPs per public and per cluster network? Or will all
    OSDs just bind on one IP and use only a single link?
    >
    > I'd rather avoid bonding the NICs, as if one switch fails,
    there would be at least one node unavailable, in worst case 2
    (out of 3) ...rendering the cluster inoperable.
    >
    > Are there other options I missed? 10 GE is currently out of our
    budget ;)
    >
    > Thanks,
    > Sven
    >
    >
    > _______________________________________________
    > ceph-users mailing list
    > [email protected] <mailto:[email protected]>
    > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    >
    >
    > _______________________________________________
    > ceph-users mailing list
    > [email protected] <mailto:[email protected]>
    > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    --
    Cédric

    _______________________________________________
    ceph-users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Cédric

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph networks, to bond or not to bond?

Reply via email to