Nick/Dennis,
Thanks for the info. I did fiddle with a location script that would determine whether the drive is a spinning or ssd drive, and put it in the appropriate bucket. I might move back to that now that I understand ceph better. Thanks for the link to the sample script as well. Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 From: Nick Fisk<mailto:[email protected]> Sent: Thursday, September 15, 2016 3:40 AM To: Jim Kilborn<mailto:[email protected]>; 'Reed Dier'<mailto:[email protected]> Cc: [email protected]<mailto:[email protected]> Subject: RE: [ceph-users] Replacing a failed OSD > -----Original Message----- > From: ceph-users [mailto:[email protected]] On Behalf Of Jim > Kilborn > Sent: 14 September 2016 20:30 > To: Reed Dier <[email protected]> > Cc: [email protected] > Subject: Re: [ceph-users] Replacing a failed OSD > > Reed, > > > > Thanks for the response. > > > > Your process is the one that I ran. However, I have a crushmap with ssd and > sata drives in different buckets (host made up of host > types, with and ssd and spinning hosttype for each host) because I am using > ssd drives for a replicated cache in front of an erasure > code data for cephfs. > > > > I have "osd crush update on start = false" so that osds don't randomly get > added to the crush map, because it wouldn't know where > to put that osd. > > > > I am using puppet to provision the drives when it sees one in a slot and it > doesn't see the ceph signature (I guess). I am using the ceph > puppet module. > > > > The real confusion is why I have to remove it from the crush map. Once I > remove it from the crush map, it does bring it up as the same > osd number, but its not in the crush map, so I have to put it back where it > belongs. Just seems strange that it must be removed from > the crush map. > > > > Basically, I export the crush map, remove the osd from the crush map, then > redeploy the drive. Then when it gets up and running as > the same osd number, I import the exported crush map to get it back in the > cluster. > > > > I guess that is just how it has to be done. You can pass a script in via the 'osd crush location hook' variable so that the OSD's automatically get placed in the right location when they startup. Thanks to Wido there is already a script that you can probably use with very few modifications: https://gist.github.com/wido/5d26d88366e28e25e23d > > > > Thanks again > > > > Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 > > > > From: Reed Dier<mailto:[email protected]> > Sent: Wednesday, September 14, 2016 1:39 PM > To: Jim Kilborn<mailto:[email protected]> > Cc: [email protected]<mailto:[email protected]> > Subject: Re: [ceph-users] Replacing a failed OSD > > > > Hi Jim, > > This is pretty fresh in my mind so hopefully I can help you out here. > > Firstly, the crush map will back fill any holes in the enumeration that are > existing. So assuming only one drive has been removed from > the crush map, it will repopulate the same OSD number. > > My steps for removing an OSD are run from the host node: > > > ceph osd down osd.i > > ceph osd out osd.i > > stop ceph-osd id=i > > umount /var/lib/ceph/osd/ceph-i > > ceph osd crush remove osd.i > > ceph auth del osd.i > > ceph osd rm osd.i > > > From here, the disk is removed from the ceph cluster, crush map, and is ready > for removal and replacement. > > From there I deploy the new osd with ceph-deploy from my admin node using: > > > ceph-deploy disk list nodei > > ceph-deploy disk zap nodei:sdX > > ceph-deploy --overwrite-conf osd prepare nodei:sdX > > > This will prepare the disk and insert it back into the crush map, bringing it > back up and in. The OSD number should remain the same, as > it will fill the gap left from the previous OSD removal. > > Hopefully this helps, > > Reed > > > On Sep 14, 2016, at 11:00 AM, Jim Kilborn <[email protected]> wrote: > > > > I am finishing testing our new cephfs cluster and wanted to document a > > failed osd procedure. > > I noticed that when I pulled a drive, to simulate a failure, and run > > through the replacement steps, the osd has to be removed from > the crushmap in order to initialize the new drive as the same osd number. > > > > Is this correct that I have to remove it from the crushmap, then after the > > osd is initialized, and mounted, add it back to the crush > map? Is there no way to have it reuse the same osd # without removing if from > the crush map? > > > > Thanks for taking the time.. > > > > > > - Jim > > > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
