Resending in case this email was lost On Tue, Jan 23, 2018 at 10:50 PM Mayank Kumar <[email protected]> wrote:
> Thanks Burkhard for the detailed explanation. Regarding the following:- > > >>>The ceph client (librbd accessing a volume in this case) gets > asynchronous notification from the ceph mons in case of relevant changes, > e.g. updates to the osd map reflecting the failure of an OSD. > i have some more questions:- > 1: Does the asynchronous notification for both osdmap and monmap comes > from mons ? > 2: Are these asynchronous notifications retriable ? > 3: Is it possible that the these asynchronous notifications are lost ? > 4: Does the monmap and osdmap reside in the kernel or user space ? The > reason i am asking is , for a rbd volume that is already mounted on a host, > will it continue to receive those asynchronoous notifications for changes > to both osd and mon ips or not ? If All mon ips change, but the mon > configuration file is updated to reflect the new mon ips, should the > existing rbd volume mounted still be able to contact the osd's and mons or > is there some form of caching in the kernel space for an already mounted > rbd volume > > > Some more context for why i am getting all these doubts:- > We internally had a ceph cluster with rbd volumes being provisioned by > Kubernetes. With existing rbd volumes already mounted , we wiped out the > old ceph cluster and created a brand new ceph cluster . But the existing > rbd volumes from the old cluster still remained. Any kubernetes pods that > landed on the same host as an old rbd volume would not create because the > volume failed to attach and mount. Looking at the kernel messages we saw > the following:- > > -- Logs begin at Fri 2018-01-19 02:05:38 GMT, end at Fri 2018-01-19 > 19:23:14 GMT. -- > > Jan 19 19:20:39 host1.com kernel: *libceph: osd2 10.231.171.131:6808 > <http://10.231.171.131:6808/> socket closed (con state CONNECTING)* > > Jan 19 19:18:30 host1.com kernel: *libceph: osd28 10.231.171.52:6808 > <http://10.231.171.52:6808/> socket closed (con state CONNECTING)* > > Jan 19 19:18:30 host1.com kernel: *libceph: osd0 10.231.171.131:6800 > <http://10.231.171.131:6800/> socket closed (con state CONNECTING)* > > Jan 19 19:15:40 host1.com kernel: *libceph: osd21 10.231.171.99:6808 > <http://10.231.171.99:6808/> wrong peer at address* > > Jan 19 19:15:40 host1.com kernel: *libceph: wrong peer, > want 10.231.171.99:6808/42661 <http://10.231.171.99:6808/42661>, > got 10.231.171.99:6808/73168 <http://10.231.171.99:6808/73168>* > > Jan 19 19:15:34 host1.com kernel: *libceph: osd11 10.231.171.114:6816 > <http://10.231.171.114:6816/> wrong peer at address* > > Jan 19 19:15:34 host1.com kernel: *libceph: wrong peer, > want 10.231.171.114:6816/130908 <http://10.231.171.114:6816/130908>, > got 10.231.171.114:6816/85562 <http://10.231.171.114:6816/85562>* > > The Ceph cluster had new osd ip and mon ips. > > So my questions, since these messages are coming from the kernel module, > why cant the kernel module figure out that the mon and osd ips have > changed. Is there some caching in the kernel ? when rbd create/attach is > called on that host, it is passed new mon ips , so doesnt that update the > old already mounted rbd volumes. > > Hope i made my doubts clear and yes i am a beginner in Ceph with very > limited knowledge. > > Thanks for your help again > Mayank > > > On Tue, Jan 23, 2018 at 1:24 AM, Burkhard Linke < > [email protected]> wrote: > >> Hi, >> >> >> On 01/23/2018 09:53 AM, Mayank Kumar wrote: >> >>> Hi Ceph Experts >>> >>> I am a new user of Ceph and currently using Kubernetes to deploy Ceph >>> RBD Volumes. We our doing some initial work rolling it out to internal >>> customers and in doing that we are using the ip of the host as the ip of >>> the osd and mons. This means if a host goes down , we loose that ip. While >>> we are still experimenting with these behaviors, i wanted to see what the >>> community thinks for the following scenario :- >>> >>> 1: a rbd volume is already attached and mounted on host A >>> 2: the osd on which this rbd volume resides, dies and never comes back up >>> 3: another osd is replaced in its place. I dont know the intricacies >>> here, but i am assuming the data for this rbd volume either moves to >>> different osd's or goes back to the newly installed osd >>> 4: the new osd has completley new ip >>> 5: will the rbd volume attached to host A learn the new osd ip on which >>> its data resides and everything just continues to work ? >>> >>> What if all the mons also have changed ip ? >>> >> A volume does not reside "on a osd". The volume is striped, and each >> strip is stored in a placement group; the placement group on the other hand >> is distributed to several OSDs depending on the crush rules and the number >> of replicates. >> >> If an OSD dies, ceph will backfill the now missing replicates to another >> OSD, given another OSD satisfying the crush rules is available. The same >> process is also triggered if an OSD is added. >> >> This process is somewhat transparent to the ceph client, as long as >> enough replicates a present. The ceph client (librbd accessing a volume in >> this case) gets asynchronous notification from the ceph mons in case of >> relevant changes, e.g. updates to the osd map reflecting the failure of an >> OSD. Traffic to the OSD will be automatically rerouted depending on the >> crush rules as explained above. The OSD map also contains the IP address of >> all OSDs, so changes to the IP address are just another update to the map. >> >> The only problem you might run into is changing the IP address of the >> mons. There's also a mon map listing all active mons; if the mon a ceph >> client is using dies/is removed, the client will switch to another active >> mon from the map. This works fine in a running system; you can change the >> IP address of a mon one by one without any interruption to the client >> (theoretically....). >> >> The problem is starting the ceph client. In this case the client uses the >> list of mons from the ceph configuration file to contact one mon and >> receive the initial mon map. If you change the hostnames/IP address of the >> mons, you also need to update the ceph configuration file. >> >> The above outline is how it should work, given a valid ceph and network >> setup. YMMV. >> >> Regards, >> Burkhard >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
