Deep scrub doesn't help.
After some steps (not sure what exact list)
ceph does remap this pg to another osd, but PG doesn't move
# ceph pg map 11.206
osdmap e176314 pg 11.206 (11.206) -> up [955,198,801] acting [787,697]
Hangs in this state forever, 'ceph pg 11.206 query' hangs as well
On Sat, Apr 7, 2018 at 12:42 AM, Konstantin Danilov
>> What happens when you deep-scrub this PG?
> we haven't try to deep-scrub it, will try.
>> What do the OSD logs show for any lines involving the problem PGs?
> Nothing special were logged about this particular osd, except that it's
> Yet osd consume quite a lot portion of its CPU time in
> snappy/leveldb/jemalloc libs.
> In logs there a lot of messages from leveldb about moving data between
> Needles to mention that this PG is from RGW index bucket, so it's metadata
> and get a relatively hight load. Yet not we have 3 PG with the same
> behavior from rgw data pool ()cluster have almost all data in RGW
>> Was anything happening on your cluster just before this started happening
>> at first?
> Cluster gets many updates in a week before issue, but nothing particularly
> SSD OSD get's split in two, about 10% of OSD were removed. Some networking
> On Fri, Apr 6, 2018 at 10:07 PM, David Turner <drakonst...@gmail.com> wrote:
>> What happens when you deep-scrub this PG? What do the OSD logs show for
>> any lines involving the problem PGs? Was anything happening on your cluster
>> just before this started happening at first?
>> On Fri, Apr 6, 2018 at 2:29 PM Konstantin Danilov <kdani...@mirantis.com>
>>> Hi all, we have a strange issue on one cluster.
>>> One PG is mapped to the particular set of OSD, say X,Y and Z doesn't
>>> matter what how
>>> we change crush map.
>>> The whole picture is next:
>>> * This is 10.2.7 ceph version, all monitors and osd's have the same
>>> * One PG eventually get into 'active+degraded+incomplete' state. It
>>> was active+clean for a long time
>>> and already has some data. We can't detect the event, which leads it
>>> to this state. Probably it's
>>> happened after some osd was removed from the cluster
>>> * This PG has all 3 required OSD up and running, and all of them
>>> online (pool_sz=3, min_pool_sz=2)
>>> * All requests to pg stack forever, historic_ops shows that it waiting
>>> on "waiting_for_degraded_pg"
>>> * ceph pg query hangs forever
>>> * We can't copy data from another pool as well - copying process hangs
>>> and that fails with
>>> (34) Numerical result out of range
>>> * We was trying to restart osd's, nodes, mon's with no effects
>>> * Eventually we found that shutting down osd Z(not primary) does solve
>>> the issue, but
>>> only before ceph set this osd out. If we trying to change the weight
>>> of this osd or remove it from cluster problem appears again. Cluster
>>> is working only while osd Z is down and not out and has the default
>>> * Then we have found that doesn't matter what we are doing with crushmap
>>> osdmaptool --test-map-pgs-dump always put this PG to the same set of
>>> osd - [X, Y] (in this osdmap Z is already down). We updating crush map
>>> to remove nodes with OSD X,Y and Z completely out of it, compile it,
>>> import it back to osdmap and run osdmaptool and always get the same
>>> * After several nodes restart and setting osd Z down, but no out we
>>> are now have 3 more PG with the same behaviour, but 'pined' to another
>>> * We have run osdmaptool from luminous ceph to check if upmap
>>> extension is somehow getting into this osd map - it is not.
>>> So this is where we are now. Have anyone seen something like this? Any
>>> ideas are welcome. Thanks
>>> Kostiantyn Danilov
>>> ceph-users mailing list
> Kostiantyn Danilov aka koder.ua
> Principal software engineer, Mirantis
Kostiantyn Danilov aka koder.ua
Principal software engineer, Mirantis
ceph-users mailing list