Yes, forgot to mention that, of course LACP and stackable switches is
the safest and easy way, but sometimes when budget is a constraint you
have to deal with it. Prices difference between simple Gb switches and
stackable ones are not negligible. You generally get what you paid for ;-)

But I think Linux bonding with a simple network design (2x1Gb for each
Ceph networks) could do the trick well, with some works overhead. Maybe
some cephers on this list could confirm that ?

Cheers


Le 05/06/2014 21:21, Scott Laird a écrit :
> Doing bonding without LACP is probably going to end up being painful.
>  Sooner or later you're going to end up with one end thinking that
> bonding is working while the other end thinks that it's not, and half
> of your traffic is going to get black-holed.
>
> I've had moderately decent luck running Ceph on top of a weird network
> by carefully controlling the source address that every outbound
> connection uses and then telling Ceph that it's running with a
> 1-network config.  With Linux, the default source address of an
> outbound TCP connection is a function of the route that the kernel
> picks to send traffic to the remote end, and you can override it on a
> per-route basis (it's visible as the the 'src' attribute in iproute).
>  I have a mixed Infiniband+GigE network with each host running an OSPF
> routing daemon (for non-Ceph reasons, mostly), and the only two ways
> that I could get Ceph to be happy were:
>
> 1.  Turn off the Infiniband network.  Slow, and causes other problems.
> 2.  Tell Ceph that there was no cluster network, and tell the OSPF
> daemon to always set src=$eth0_ip on routes that it adds.  Then just
> pretend that the Ethernet network is the only one that exists, and
> sometimes you get a sudden and unexpected boost in bandwidth due to
> /32 routes that send traffic via Infiniband instead of Ethernet.
>
> It works, but I wouldn't recommend it for production.  It would have
> been cheaper for me to buy a 10 GigE switch and cards for my garage
> than to have debugged all of this, and that's just for a hobby project.
>
> OTOH, it's probably the only way to get working multipathing for Ceph.
>
>
> On Thu, Jun 5, 2014 at 10:50 AM, Cedric Lemarchand <ced...@yipikai.org
> <mailto:ced...@yipikai.org>> wrote:
>
>     Le 05/06/2014 18:27, Sven Budde a écrit :
>     > Hi Alexandre,
>     >
>     > thanks for the reply. As said, my switches are not stackable, so
>     using LCAP seems not to be my best option.
>     >
>     > I'm seeking for an explanation how Ceph is utilizing two (or
>     more) independent links on both the public and the cluster network.
>     AFAIK, Ceph do not support multiple IP link in the same "designated
>     network" (aka client/osd networks). Ceph is not aware of links
>     aggregations, it has to be done at the Ethernet layer, so :
>
>     - if your switchs are stackable, you can use traditional LACP on both
>     sides (switch and Ceph)
>     - if they are not, and as Mariusz said, use the appropriate
>     bonding mode
>     on the Ceph side and do not use LCAP on switchs.
>
>     More infos here :
>     http://www.linuxfoundation.org/collaborate/workgroups/networking/bonding
>
>     Cheers !
>     >
>     > If I configure two IPs for the public network on two NICs, will
>     Ceph route traffic from its (multiple) OSDs on this node over both
>     IPs?
>     >
>     > Cheers,
>     > Sven
>     >
>     > -----Ursprüngliche Nachricht-----
>     > Von: Alexandre DERUMIER [mailto:aderum...@odiso.com
>     <mailto:aderum...@odiso.com>]
>     > Gesendet: Donnerstag, 5. Juni 2014 18:14
>     > An: Sven Budde
>     > Cc: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>     > Betreff: Re: [ceph-users] Ceph networks, to bond or not to bond?
>     >
>     > Hi,
>     >
>     >>> My low-budget setup consists of two gigabit switches, capable
>     of LACP,
>     >>> but not stackable. For redundancy, I'd like to have my links
>     spread
>     >>> evenly over both switches.
>     > If you want to do lacp with both switches, they need to be
>     stackable.
>     >
>     > (or use active-backup bonding)
>     >
>     >>> My question where I didn't find a conclusive answer in the
>     >>> documentation and mailing archives:
>     >>> Will the OSDs utilize both 'single' interfaces per network, if I
>     >>> assign two IPs per public and per cluster network? Or will all
>     OSDs
>     >>> just bind on one IP and use only a single link?
>     > you just need 1 ip by bond.
>     >
>     > with lacp, the load balacing use an hash algorithm, to
>     loadbalance tcp connections.
>     > (that also mean than 1 connection can't use more than 1 link)
>     >
>     > check that your switch support ip+port hash algorithm,
>     (xmit_hash_policy=layer3+4  is linux lacp bonding)
>     >
>     > like this, each osd->osd can be loadbalanced, same for your
>     clients->osd.
>     >
>     >
>     >
>     >
>     >
>     >
>     > ----- Mail original -----
>     >
>     > De: "Sven Budde" <sven.bu...@itgration-gmbh.de
>     <mailto:sven.bu...@itgration-gmbh.de>>
>     > À: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>     > Envoyé: Jeudi 5 Juin 2014 16:20:04
>     > Objet: [ceph-users] Ceph networks, to bond or not to bond?
>     >
>     > Hello all,
>     >
>     > I'm currently building a new small cluster with three nodes,
>     each node having 4x 1 Gbit/s network interfaces available and 8-10
>     OSDs running per node.
>     >
>     > I thought I assign 2x 1 Gb/s for the public network, and the
>     other 2x 1 Gb/s for the cluster network.
>     >
>     > My low-budget setup consists of two gigabit switches, capable of
>     LACP, but not stackable. For redundancy, I'd like to have my links
>     spread evenly over both switches.
>     >
>     > My question where I didn't find a conclusive answer in the
>     documentation and mailing archives:
>     > Will the OSDs utilize both 'single' interfaces per network, if I
>     assign two IPs per public and per cluster network? Or will all
>     OSDs just bind on one IP and use only a single link?
>     >
>     > I'd rather avoid bonding the NICs, as if one switch fails, there
>     would be at least one node unavailable, in worst case 2 (out of 3)
>     ...rendering the cluster inoperable.
>     >
>     > Are there other options I missed? 10 GE is currently out of our
>     budget ;)
>     >
>     > Thanks,
>     > Sven
>     >
>     >
>     > _______________________________________________
>     > ceph-users mailing list
>     > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     >
>     >
>     > _______________________________________________
>     > ceph-users mailing list
>     > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>     --
>     Cédric
>
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

-- 
Cédric

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to