Re: [ceph-users] Odp.: pgs incomplete and inactive

David Turner Mon, 27 Aug 2018 10:23:57 -0700

I came across a problem like this before with small flash OSDs for
metadata.  There is an open tracker about why it was able to fill 100% of
the way up, but no work done on it in 6 months after I got back to
healthy.  The way I did that was deleting one copy of a PG from each OSD
(different PGs on each) taking me down to 2 replicas of those PGs.  ie I
used the ceph-objectstore-tool to delete pg 4.1, 4.2, and 4.3 on osd.6, pg
4.4, 4.5, and 4.7 on osd.7, etc.  That allowed the cluster to compact the
things it needed to as well as allow me to change the crush rule for the
pool so that it would move the pgs for the pool to larger disks until I had
larger OSDs to put it back to later.


On Mon, Aug 27, 2018 at 7:36 AM Tomasz Kuzemko <tomasz.kuze...@corp.ovh.com>
wrote:

> Hello Josef,
> I would suggest setting up a bigger disk (if not physical then maybe a LVM
> volume from 2 smaller disks) and cloning (remember about extended
> attributes!) the OSD data dir to the new disk, then try to bring the OSD
> back into cluster.
>
> --
> Tomasz Kuzemko
> tomasz.kuze...@corp.ovh.com
>
> ________________________________________
> Od: ceph-users <ceph-users-boun...@lists.ceph.com> w imieniu użytkownika
> Josef Zelenka <josef.zele...@cloudevelops.com>
> Wysłane: poniedziałek, 27 sierpnia 2018 13:29
> Do: Paul Emmerich; ceph-users@lists.ceph.com
> Temat: Re: [ceph-users] pgs incomplete and inactive
>
> The fullratio was ignored, that's why that happenned most likely. I
> can't delete pgs, because it's only kb's worth of space - the osd is
> 40gb, 39.8 gb is taken up by omap - that's why i can't move/extract. Any
> clue on how to compress/move away the omap dir?
>
>
>
> On 27/08/18 12:34, Paul Emmerich wrote:
> > Don't ever let an OSD run 100% full, that's usually bad news.
> > Two ways to salvage this:
> >
> > 1. You can try to extract the PGs with ceph-objectstore-tool and
> > inject them into another OSD; Ceph will find them and recover
> > 2. You seem to be using Filestore, so you should easily be able to
> > just delete a whole PG on the full OSD's file system to make space
> > (preferably one that is already recovered and active+clean even
> > without the dead OSD)
> >
> >
> > Paul
> >
> > 2018-08-27 10:44 GMT+02:00 Josef Zelenka <josef.zele...@cloudevelops.com
> >:
> >> Hi, i've had a very ugly thing happen to me over the weekend. Some of my
> >> OSDs in a root that handles metadata pools overflowed to 100% disk
> usage due
> >> to omap size(even though i had 97% full ratio, which is odd) and
> refused to
> >> start. There were some pgs on those OSDs that went away with them. I
> have
> >> tried compacting the omap, moving files away etc, but nothing  - i can't
> >> export the pgs, i get errors like this:
> >>
> >> 2018-08-27 04:42:33.436182 7fcb53382580  4 rocksdb: EVENT_LOG_v1
> >> {"time_micros": 1535359353436170, "job": 1, "event": "recovery_started",
> >> "log_files": [5504, 5507]}
> >> 2018-08-27 04:42:33.436194 7fcb53382580  4 rocksdb:
> >> [/build/ceph-12.2.5/src/rocksdb/db/db_impl_open.cc:482] Recovering log
> #5504
> >> mode 2
> >> 2018-08-27 04:42:35.422502 7fcb53382580  4 rocksdb:
> >> [/build/ceph-12.2.5/src/rocksdb/db/db_impl.cc:217] Shutdown: canceling
> all
> >> background work
> >> 2018-08-27 04:42:35.431613 7fcb53382580  4 rocksdb:
> >> [/build/ceph-12.2.5/src/rocksdb/db/db_impl.cc:343] Shutdown complete
> >> 2018-08-27 04:42:35.431716 7fcb53382580 -1 rocksdb: IO error: No space
> left
> >> on device/var/lib/ceph/osd/ceph-5//current/omap/005507.sst: No space
> left on
> >> device
> >> Mount failed with '(1) Operation not permitted'
> >> 2018-08-27 04:42:35.432945 7fcb53382580 -1
> >> filestore(/var/lib/ceph/osd/ceph-5/) mount(1723): Error initializing
> rocksdb
> >> :
> >>
> >> I decided to take the loss and mark the osds as lost and remove them
> from
> >> the cluster, however, it left 4 pgs hanging in incomplete + inactive
> state,
> >> which apparently prevents my radosgw from starting. Is there another
> way to
> >> export/import the pgs into their new osds/recreate them? I'm running
> >> Luminous 12.2.5 on Ubuntu 16.04.
> >>
> >> Thanks
> >>
> >> Josef
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Odp.: pgs incomplete and inactive

Reply via email to