David,

> What happens when you deep-scrub this PG?
we haven't try to deep-scrub it, will try.

> What do the OSD logs show for any lines involving the problem PGs?
Nothing special were logged about this particular osd, except that it's
degraded.
Yet osd consume quite a lot portion of its CPU time in
snappy/leveldb/jemalloc libs.
In logs there a lot of messages from leveldb about moving data between
levels.
Needles to mention that this PG is from RGW index bucket, so it's metadata
only
and get a relatively hight load. Yet not we have 3 PG with the same
behavior from rgw data pool ()cluster have almost all data in RGW

> Was anything happening on your cluster just before this started happening
at first?
Cluster gets many updates in a week before issue, but nothing particularly
noticeable.
SSD OSD get's split in two, about 10% of OSD were removed. Some networking
issues
appears.

Thanks

On Fri, Apr 6, 2018 at 10:07 PM, David Turner <drakonst...@gmail.com> wrote:

> What happens when you deep-scrub this PG?  What do the OSD logs show for
> any lines involving the problem PGs?  Was anything happening on your
> cluster just before this started happening at first?
>
> On Fri, Apr 6, 2018 at 2:29 PM Konstantin Danilov <kdani...@mirantis.com>
> wrote:
>
>> Hi all, we have a strange issue on one cluster.
>>
>> One PG is mapped to the particular set of OSD, say X,Y and Z doesn't
>> matter what how
>> we change crush map.
>> The whole picture is next:
>>
>> * This is 10.2.7 ceph version, all monitors and osd's have the same
>> version
>> * One  PG eventually get into 'active+degraded+incomplete' state. It
>> was active+clean for a long time
>> and already has some data. We can't detect the event, which leads it
>> to this state. Probably it's
>> happened after some osd was removed from the cluster
>> * This PG has all 3 required OSD up and running, and all of them
>> online (pool_sz=3, min_pool_sz=2)
>> * All requests to pg stack forever, historic_ops shows that it waiting
>> on "waiting_for_degraded_pg"
>> * ceph pg query hangs forever
>> * We can't copy data from another pool as well - copying process hangs
>> and that fails with
>> (34) Numerical result out of range
>>  * We was trying to restart osd's, nodes, mon's with no effects
>> * Eventually we found that shutting down osd Z(not primary) does solve
>> the issue, but
>> only before ceph set this osd out. If we trying to change the weight
>> of this osd or remove it from cluster problem appears again. Cluster
>> is working only while osd Z is down and not out and has the default
>> weight
>> * Then we have found that doesn't matter what we are doing with crushmap -
>> osdmaptool --test-map-pgs-dump always put this PG to the same set of
>> osd - [X, Y] (in this osdmap Z is already down). We updating crush map
>> to remove nodes with OSD X,Y and Z completely out of it, compile it,
>> import it back to osdmap and run osdmaptool and always get the same
>> results
>> * After several nodes restart and setting osd Z down, but no out we
>> are now have 3 more PG with the same behaviour, but 'pined' to another
>> osd's
>> * We have run osdmaptool from luminous ceph to check if upmap
>> extension is somehow getting into this osd map - it is not.
>>
>> So this is where we are now. Have anyone seen something like this? Any
>> ideas are welcome. Thanks
>>
>>
>> --
>> Kostiantyn Danilov
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>


-- 
Kostiantyn Danilov aka koder.ua
Principal software engineer, Mirantis

skype:koder.ua
http://koder-ua.blogspot.com/
http://mirantis.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to