Yes, forgot to mention that, of course LACP and stackable switches is the safest and easy way, but sometimes when budget is a constraint you have to deal with it. Prices difference between simple Gb switches and stackable ones are not negligible. You generally get what you paid for ;-)
But I think Linux bonding with a simple network design (2x1Gb for each Ceph networks) could do the trick well, with some works overhead. Maybe some cephers on this list could confirm that ? Cheers Le 05/06/2014 21:21, Scott Laird a écrit : > Doing bonding without LACP is probably going to end up being painful. > Sooner or later you're going to end up with one end thinking that > bonding is working while the other end thinks that it's not, and half > of your traffic is going to get black-holed. > > I've had moderately decent luck running Ceph on top of a weird network > by carefully controlling the source address that every outbound > connection uses and then telling Ceph that it's running with a > 1-network config. With Linux, the default source address of an > outbound TCP connection is a function of the route that the kernel > picks to send traffic to the remote end, and you can override it on a > per-route basis (it's visible as the the 'src' attribute in iproute). > I have a mixed Infiniband+GigE network with each host running an OSPF > routing daemon (for non-Ceph reasons, mostly), and the only two ways > that I could get Ceph to be happy were: > > 1. Turn off the Infiniband network. Slow, and causes other problems. > 2. Tell Ceph that there was no cluster network, and tell the OSPF > daemon to always set src=$eth0_ip on routes that it adds. Then just > pretend that the Ethernet network is the only one that exists, and > sometimes you get a sudden and unexpected boost in bandwidth due to > /32 routes that send traffic via Infiniband instead of Ethernet. > > It works, but I wouldn't recommend it for production. It would have > been cheaper for me to buy a 10 GigE switch and cards for my garage > than to have debugged all of this, and that's just for a hobby project. > > OTOH, it's probably the only way to get working multipathing for Ceph. > > > On Thu, Jun 5, 2014 at 10:50 AM, Cedric Lemarchand <ced...@yipikai.org > <mailto:ced...@yipikai.org>> wrote: > > Le 05/06/2014 18:27, Sven Budde a écrit : > > Hi Alexandre, > > > > thanks for the reply. As said, my switches are not stackable, so > using LCAP seems not to be my best option. > > > > I'm seeking for an explanation how Ceph is utilizing two (or > more) independent links on both the public and the cluster network. > AFAIK, Ceph do not support multiple IP link in the same "designated > network" (aka client/osd networks). Ceph is not aware of links > aggregations, it has to be done at the Ethernet layer, so : > > - if your switchs are stackable, you can use traditional LACP on both > sides (switch and Ceph) > - if they are not, and as Mariusz said, use the appropriate > bonding mode > on the Ceph side and do not use LCAP on switchs. > > More infos here : > http://www.linuxfoundation.org/collaborate/workgroups/networking/bonding > > Cheers ! > > > > If I configure two IPs for the public network on two NICs, will > Ceph route traffic from its (multiple) OSDs on this node over both > IPs? > > > > Cheers, > > Sven > > > > -----Ursprüngliche Nachricht----- > > Von: Alexandre DERUMIER [mailto:aderum...@odiso.com > <mailto:aderum...@odiso.com>] > > Gesendet: Donnerstag, 5. Juni 2014 18:14 > > An: Sven Budde > > Cc: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > > Betreff: Re: [ceph-users] Ceph networks, to bond or not to bond? > > > > Hi, > > > >>> My low-budget setup consists of two gigabit switches, capable > of LACP, > >>> but not stackable. For redundancy, I'd like to have my links > spread > >>> evenly over both switches. > > If you want to do lacp with both switches, they need to be > stackable. > > > > (or use active-backup bonding) > > > >>> My question where I didn't find a conclusive answer in the > >>> documentation and mailing archives: > >>> Will the OSDs utilize both 'single' interfaces per network, if I > >>> assign two IPs per public and per cluster network? Or will all > OSDs > >>> just bind on one IP and use only a single link? > > you just need 1 ip by bond. > > > > with lacp, the load balacing use an hash algorithm, to > loadbalance tcp connections. > > (that also mean than 1 connection can't use more than 1 link) > > > > check that your switch support ip+port hash algorithm, > (xmit_hash_policy=layer3+4 is linux lacp bonding) > > > > like this, each osd->osd can be loadbalanced, same for your > clients->osd. > > > > > > > > > > > > > > ----- Mail original ----- > > > > De: "Sven Budde" <sven.bu...@itgration-gmbh.de > <mailto:sven.bu...@itgration-gmbh.de>> > > À: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > > Envoyé: Jeudi 5 Juin 2014 16:20:04 > > Objet: [ceph-users] Ceph networks, to bond or not to bond? > > > > Hello all, > > > > I'm currently building a new small cluster with three nodes, > each node having 4x 1 Gbit/s network interfaces available and 8-10 > OSDs running per node. > > > > I thought I assign 2x 1 Gb/s for the public network, and the > other 2x 1 Gb/s for the cluster network. > > > > My low-budget setup consists of two gigabit switches, capable of > LACP, but not stackable. For redundancy, I'd like to have my links > spread evenly over both switches. > > > > My question where I didn't find a conclusive answer in the > documentation and mailing archives: > > Will the OSDs utilize both 'single' interfaces per network, if I > assign two IPs per public and per cluster network? Or will all > OSDs just bind on one IP and use only a single link? > > > > I'd rather avoid bonding the NICs, as if one switch fails, there > would be at least one node unavailable, in worst case 2 (out of 3) > ...rendering the cluster inoperable. > > > > Are there other options I missed? 10 GE is currently out of our > budget ;) > > > > Thanks, > > Sven > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- > Cédric > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Cédric
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com