David, > What happens when you deep-scrub this PG? we haven't try to deep-scrub it, will try.
> What do the OSD logs show for any lines involving the problem PGs? Nothing special were logged about this particular osd, except that it's degraded. Yet osd consume quite a lot portion of its CPU time in snappy/leveldb/jemalloc libs. In logs there a lot of messages from leveldb about moving data between levels. Needles to mention that this PG is from RGW index bucket, so it's metadata only and get a relatively hight load. Yet not we have 3 PG with the same behavior from rgw data pool ()cluster have almost all data in RGW > Was anything happening on your cluster just before this started happening at first? Cluster gets many updates in a week before issue, but nothing particularly noticeable. SSD OSD get's split in two, about 10% of OSD were removed. Some networking issues appears. Thanks On Fri, Apr 6, 2018 at 10:07 PM, David Turner <drakonst...@gmail.com> wrote: > What happens when you deep-scrub this PG? What do the OSD logs show for > any lines involving the problem PGs? Was anything happening on your > cluster just before this started happening at first? > > On Fri, Apr 6, 2018 at 2:29 PM Konstantin Danilov <kdani...@mirantis.com> > wrote: > >> Hi all, we have a strange issue on one cluster. >> >> One PG is mapped to the particular set of OSD, say X,Y and Z doesn't >> matter what how >> we change crush map. >> The whole picture is next: >> >> * This is 10.2.7 ceph version, all monitors and osd's have the same >> version >> * One PG eventually get into 'active+degraded+incomplete' state. It >> was active+clean for a long time >> and already has some data. We can't detect the event, which leads it >> to this state. Probably it's >> happened after some osd was removed from the cluster >> * This PG has all 3 required OSD up and running, and all of them >> online (pool_sz=3, min_pool_sz=2) >> * All requests to pg stack forever, historic_ops shows that it waiting >> on "waiting_for_degraded_pg" >> * ceph pg query hangs forever >> * We can't copy data from another pool as well - copying process hangs >> and that fails with >> (34) Numerical result out of range >> * We was trying to restart osd's, nodes, mon's with no effects >> * Eventually we found that shutting down osd Z(not primary) does solve >> the issue, but >> only before ceph set this osd out. If we trying to change the weight >> of this osd or remove it from cluster problem appears again. Cluster >> is working only while osd Z is down and not out and has the default >> weight >> * Then we have found that doesn't matter what we are doing with crushmap - >> osdmaptool --test-map-pgs-dump always put this PG to the same set of >> osd - [X, Y] (in this osdmap Z is already down). We updating crush map >> to remove nodes with OSD X,Y and Z completely out of it, compile it, >> import it back to osdmap and run osdmaptool and always get the same >> results >> * After several nodes restart and setting osd Z down, but no out we >> are now have 3 more PG with the same behaviour, but 'pined' to another >> osd's >> * We have run osdmaptool from luminous ceph to check if upmap >> extension is somehow getting into this osd map - it is not. >> >> So this is where we are now. Have anyone seen something like this? Any >> ideas are welcome. Thanks >> >> >> -- >> Kostiantyn Danilov >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > -- Kostiantyn Danilov aka koder.ua Principal software engineer, Mirantis skype:koder.ua http://koder-ua.blogspot.com/ http://mirantis.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com