Re: [ceph-users] Ceph networks, to bond or not to bond?

Sven Budde Tue, 10 Jun 2014 13:01:31 -0700

Hi Josef, all,

its never to late to join a party ;) The cheap switches dont support mlag
either.

I did some testing today with the balance-alb mode which works fine so far
in this setup.

Im able to have my links placed redundantly on both switch; utilize up to 2
gb/s when talking to multiple peers and to do fail-over if a switch fails
all I wanted :)

Only drawback I found so far: each peer to peer link is limited to a single
links speed (1 gb/s), but the OSDs are chatty, so this shouldnt be an
issue for my setup.

Cheers,

Sven

Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von
Josef Johansson
Gesendet: Samstag, 7. Juni 2014 23:48
An: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] Ceph networks, to bond or not to bond?

Hi,

Late to the party, but just to be sure, does the switch support mc-lag or
mlag by any chance?
There could be updates integrating this.

Cheers,
Josef

Sven Budde skrev 2014-06-06 13:06:

Hi all,

thanks for the replies and heads up for the different bonding options. I'll
toy around with them in the next days; hopefully there's some stable setup
possible with provides HA and increased bandwidth together.

Cheers,
Sven

Am 05.06.2014 21:36, schrieb Cedric Lemarchand:

Yes, forgot to mention that, of course LACP and stackable switches is the
safest and easy way, but sometimes when budget is a constraint you have to
deal with it. Prices difference between simple Gb switches and stackable
ones are not negligible. You generally get what you paid for ;-)

But I think Linux bonding with a simple network design (2x1Gb for each Ceph
networks) could do the trick well, with some works overhead. Maybe some
cephers on this list could confirm that ?

Cheers

Le 05/06/2014 21:21, Scott Laird a écrit :

Doing bonding without LACP is probably going to end up being painful.
Sooner or later you're going to end up with one end thinking that bonding is
working while the other end thinks that it's not, and half of your traffic
is going to get black-holed. 

I've had moderately decent luck running Ceph on top of a weird network by
carefully controlling the source address that every outbound connection uses
and then telling Ceph that it's running with a 1-network config.  With
Linux, the default source address of an outbound TCP connection is a
function of the route that the kernel picks to send traffic to the remote
end, and you can override it on a per-route basis (it's visible as the the
'src' attribute in iproute).  I have a mixed Infiniband+GigE network with
each host running an OSPF routing daemon (for non-Ceph reasons, mostly), and
the only two ways that I could get Ceph to be happy were:

1.  Turn off the Infiniband network.  Slow, and causes other problems.

2.  Tell Ceph that there was no cluster network, and tell the OSPF daemon to
always set src=$eth0_ip on routes that it adds.  Then just pretend that the
Ethernet network is the only one that exists, and sometimes you get a sudden
and unexpected boost in bandwidth due to /32 routes that send traffic via
Infiniband instead of Ethernet.

It works, but I wouldn't recommend it for production.  It would have been
cheaper for me to buy a 10 GigE switch and cards for my garage than to have
debugged all of this, and that's just for a hobby project.

OTOH, it's probably the only way to get working multipathing for Ceph.

On Thu, Jun 5, 2014 at 10:50 AM, Cedric Lemarchand <ced...@yipikai.org
<mailto:ced...@yipikai.org> > wrote:

Le 05/06/2014 18:27, Sven Budde a écrit :

> Hi Alexandre,
>
> thanks for the reply. As said, my switches are not stackable, so using
LCAP seems not to be my best option.
>
> I'm seeking for an explanation how Ceph is utilizing two (or more)
independent links on both the public and the cluster network.

AFAIK, Ceph do not support multiple IP link in the same "designated
network" (aka client/osd networks). Ceph is not aware of links
aggregations, it has to be done at the Ethernet layer, so :

- if your switchs are stackable, you can use traditional LACP on both
sides (switch and Ceph)
- if they are not, and as Mariusz said, use the appropriate bonding mode
on the Ceph side and do not use LCAP on switchs.

More infos here :
http://www.linuxfoundation.org/collaborate/workgroups/networking/bonding

Cheers !

>
> If I configure two IPs for the public network on two NICs, will Ceph route
traffic from its (multiple) OSDs on this node over both IPs?
>
> Cheers,
> Sven
>
> -----Ursprüngliche Nachricht-----
> Von: Alexandre DERUMIER [mailto:aderum...@odiso.com
<mailto:aderum...@odiso.com> ]
> Gesendet: Donnerstag, 5. Juni 2014 18:14
> An: Sven Budde
> Cc: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
> Betreff: Re: [ceph-users] Ceph networks, to bond or not to bond?
>
> Hi,
>
>>> My low-budget setup consists of two gigabit switches, capable of LACP,
>>> but not stackable. For redundancy, I'd like to have my links spread
>>> evenly over both switches.
> If you want to do lacp with both switches, they need to be stackable.
>
> (or use active-backup bonding)
>
>>> My question where I didn't find a conclusive answer in the
>>> documentation and mailing archives:
>>> Will the OSDs utilize both 'single' interfaces per network, if I
>>> assign two IPs per public and per cluster network? Or will all OSDs
>>> just bind on one IP and use only a single link?
> you just need 1 ip by bond.
>
> with lacp, the load balacing use an hash algorithm, to loadbalance tcp
connections.
> (that also mean than 1 connection can't use more than 1 link)
>
> check that your switch support ip+port hash algorithm,
(xmit_hash_policy=layer3+4  is linux lacp bonding)
>
> like this, each osd->osd can be loadbalanced, same for your clients->osd.
>
>
>
>
>
>
> ----- Mail original -----
>
> De: "Sven Budde" <sven.bu...@itgration-gmbh.de
<mailto:sven.bu...@itgration-gmbh.de> >
> À: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
> Envoyé: Jeudi 5 Juin 2014 16:20:04
> Objet: [ceph-users] Ceph networks, to bond or not to bond?
>
> Hello all,
>
> I'm currently building a new small cluster with three nodes, each node
having 4x 1 Gbit/s network interfaces available and 8-10 OSDs running per
node.
>
> I thought I assign 2x 1 Gb/s for the public network, and the other 2x 1
Gb/s for the cluster network.
>
> My low-budget setup consists of two gigabit switches, capable of LACP, but
not stackable. For redundancy, I'd like to have my links spread evenly over
both switches.
>
> My question where I didn't find a conclusive answer in the documentation
and mailing archives:
> Will the OSDs utilize both 'single' interfaces per network, if I assign
two IPs per public and per cluster network? Or will all OSDs just bind on
one IP and use only a single link?
>
> I'd rather avoid bonding the NICs, as if one switch fails, there would be
at least one node unavailable, in worst case 2 (out of 3) ...rendering the
cluster inoperable.
>
> Are there other options I missed? 10 GE is currently out of our budget ;)
>
> Thanks,
> Sven
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Cédric

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Cédric

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph networks, to bond or not to bond?

Reply via email to