Nice catch. That was a copy-paste error. Sorry it should have read:
3. Flush the journal and export the primary version of the PG. This took 1 minute on a well-behaved PG and 4 hours on the misbehaving PG i.e. ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-16 --journal-path /var/lib/ceph/osd/ceph-16/journal --pgid 32.10c --op export --file /root/32.10c.b.export 4. Import the PG into a New / Temporary OSD that is also offline, i.e. ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-100 --journal-path /var/lib/ceph/osd/ceph-100/journal --pgid 32.10c --op import --file /root/32.10c.b.export On Thu, Jun 2, 2016 at 5:10 PM, Brad Hubbard <[email protected]> wrote: > On Thu, Jun 2, 2016 at 9:07 AM, Brandon Morris, PMP > <[email protected]> wrote: > > > The only way that I was able to get back to Health_OK was to > export/import. ***** Please note, any time you use the > ceph_objectstore_tool you risk data loss if not done carefully. Never > remove a PG until you have a known good export ***** > > > > Here are the steps I used: > > > > 1. set NOOUT, NO BACKFILL > > 2. Stop the OSD's that have the erroring PG > > 3. Flush the journal and export the primary version of the PG. This > took 1 minute on a well-behaved PG and 4 hours on the misbehaving PG > > i.e. ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-16 > --journal-path /var/lib/ceph/osd/ceph-16/journal --pgid 32.10c --op export > --file /root/32.10c.b.export > > > > 4. Import the PG into a New / Temporary OSD that is also offline, > > i.e. ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-100 > --journal-path /var/lib/ceph/osd/ceph-100/journal --pgid 32.10c --op export > --file /root/32.10c.b.export > > This should be an import op and presumably to a different data path > and journal path more like the following? > > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-101 > --journal-path /var/lib/ceph/osd/ceph-101/journal --pgid 32.10c --op > import --file /root/32.10c.b.export > > Just trying to clarify for anyone that comes across this thread in the > future. > > Cheers, > Brad > > > > > 5. remove the PG from all other OSD's (16, 143, 214, and 448 in your > case it looks like) > > 6. Start cluster OSD's > > 7. Start the temporary OSD's and ensure 32.10c backfills correctly to > the 3 OSD's it is supposed to be on. > > > > This is similar to the recovery process described in this post from > 04/09/2015: > http://ceph-users.ceph.narkive.com/lwDkR2fZ/recovering-incomplete-pgs-with-ceph-objectstore-tool > Hopefully it works in your case too and you can the cluster back to a > state that you can make the CephFS directories smaller. > > > > - Brandon >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
