Thanks Eugen!
I was looking into running all the commands manually, following the docs for 
add/remove osd but tried ceph-disk first.

I actually made it work by changing the id part in ceph-disk ( it was checking 
the wrong journal device, which was owned by root:root ). The next part was 
that I tried re-using an old journal, so I had to create a new one ( parted / 
sgdisk to set ceph-journal parttype). Could I have just zapped the previous 
journal?
After that it prepared successfully and starting peering. Unsetting nobackfill 
let it recover a 4TB HDD in approx 9 hours.
The best part was that I didn't have to backfill twice then, by reusing the osd 
uuid.
I'll see if I can add to the docs after we have updated to Luminous or Mimic 
and started using ceph-volume.

Kind Regards
David Majchrzak

On aug 3 2018, at 4:16 pm, Eugen Block <ebl...@nde.ag> wrote:
>
> Hi,
> we have a full bluestore cluster and had to deal with read errors on
> the SSD for the block.db. Something like this helped us to recreate a
> pre-existing OSD without rebalancing, just refilling the PGs. I would
> zap the journal device and let it recreate. It's very similar to your
> ceph-deploy output, but maybe you get more of it if you run it manually:
>
> ceph-osd [--cluster-uuid <CLUSTER_UUID>] [--osd-objectstore filestore]
> --mkfs -i <OSD_ID> --osd-journal <PATH_TO_SSD> --osd-data
> /var/lib/ceph/osd/ceph-<OSD_ID>/ --mkjournal --setuser ceph --setgroup
> ceph --osd-uuid <OSD_UUID>
>
> Maybe after zapping the journal this will work. At least it would rule
> out the old journal as the show-stopper.
>
> Regards,
> Eugen
>
>
> Zitat von David Majchrzak <da...@oderland.se>:
> > Hi!
> > Trying to replace an OSD on a Jewel cluster (filestore data on HDD +
> > journal device on SSD).
> > I've set noout and removed the flapping drive (read errors) and
> > replaced it with a new one.
> >
> > I've taken down the osd UUID to be able to prepare the new disk with
> > the same osd.ID. The journal device is the same as the previous one
> > (should I delete the partition and recreate it?)
> > However, running ceph-disk prepare returns:
> > # ceph-disk -v prepare --cluster-uuid
> > c51a2683-55dc-4634-9d9d-f0fec9a6f389 --osd-uuid
> > dc49691a-2950-4028-91ea-742ffc9ed63f --journal-dev --data-dev
> > --fs-type xfs /dev/sdo /dev/sda8
> > command: Running command: /usr/bin/ceph-osd --check-allows-journal
> > -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph
> > --setuser ceph --setgroup ceph
> > command: Running command: /usr/bin/ceph-osd --check-wants-journal -i
> > 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph
> > --setuser ceph --setgroup ceph
> > command: Running command: /usr/bin/ceph-osd --check-needs-journal -i
> > 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph
> > --setuser ceph --setgroup ceph
> > Traceback (most recent call last):
> > File "/usr/sbin/ceph-disk", line 9, in <module>
> > load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
> > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5371, in run
> > main(sys.argv[1:])
> > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5322, in 
> > main
> > args.func(args)
> > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 1900, in 
> > main
> > Prepare.factory(args).prepare()
> > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line
> > 1896, in factory
> > return PrepareFilestore(args)
> > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line
> > 1909, in __init__
> > self.journal = PrepareJournal(args)
> > File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line
> > 2221, in __init__
> > raise Error('journal specified but not allowed by osd backend')
> > ceph_disk.main.Error: Error: journal specified but not allowed by osd 
> > backend
> >
> > I tried googling first of course. It COULD be that we have set
> > setuser_match_path globally in ceph.conf (like this bug report:
> > https://tracker.ceph.com/issues/19642) since the cluster was created
> > as dumpling a long time ago.
> > Best practice to fix it? Create [osd.X] configs and set
> > setuser_match_path in there instead for the old OSDs?
> > Should I do any other steps preceding this if I want to use the same
> > osd UUID? I've only stopped ceph-osd@21, removed the physical disk,
> > inserted new one and tried running prepare.
> > Kind Regards,
> > David
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to