What version was this on? -Greg On Thursday, April 18, 2013, Dan van der Ster wrote:
> Sorry for the noise.. we now have a better idea what happened here. > > For those that might care, basically we had one client looping while > trying to list the / bucket with an incorrect key. rgw was handling > this at 1kHz, so congratulations on that. I will now go and read how > to either decrease the log level or increase the log rotate frequency. > > Thanks again, > Dan > CERN IT > > On Thu, Apr 18, 2013 at 4:09 PM, Dan van der Ster <[email protected]> > wrote: > > Replying to myself... > > I just noticed this: > > > > [root@ceph-radosgw01 ceph]# ls -lh /var/log/ceph/ > > total 27G > > -rw-r--r--. 1 root root 27G Apr 18 16:08 radosgw.log > > -rw-r--r--. 1 root root 20 Apr 5 03:13 radosgw.log-20130405.gz > > -rw-r--r--. 1 root root 20 Apr 6 03:14 radosgw.log-20130406.gz > > -rw-r--r--. 1 root root 20 Apr 7 03:50 radosgw.log-20130407.gz > > -rw-r--r--. 1 root root 20 Apr 8 03:29 radosgw.log-20130408.gz > > -rw-r--r--. 1 root root 20 Apr 9 03:19 radosgw.log-20130409.gz > > -rw-r--r--. 1 root root 20 Apr 10 03:15 radosgw.log-20130410.gz > > > > -rw-r--r--. 1 root root 0 Apr 11 03:48 radosgw.log-20130411 > > > > [root@ceph-radosgw01 ceph]# df -h . > > Filesystem Size Used Avail Use% Mounted on > > /dev/mapper/vg1-root 37G 37G 0 100% / > > > > > > The radosgw log filled up the disk. Perhaps this caused the problem.. > > > > Cheers, Dan > > CERN IT > > > > On Thu, Apr 18, 2013 at 3:52 PM, Dan van der Ster <[email protected]> > wrote: > >> Hi, > >> > >> tl;dr: something deleted the objects from the .rgw.gc and then the pgs > >> went inconsistent. Is this normal??!! > >> > >> Just now we had scrub errors and resulting inconsistencies on many of > >> the pgs belonging to our .rgw.gc pool. > >> > >> HEALTH_ERR 119 pgs inconsistent; 119 scrub errors > >> pg 11.1f0 is active+clean+inconsistent, acting [35,28,4] > >> pg 11.1f8 is active+clean+inconsistent, acting [35,28,4] > >> pg 11.1fb is active+clean+inconsistent, acting [11,34,38] > >> pg 11.1e0 is active+clean+inconsistent, acting [35,28,4] > >> pg 11.1e3 is active+clean+inconsistent, acting [11,34,38] > >> … > >> > >> [root@ceph-mon1 ~]# ceph osd lspools > >> 0 data,1 metadata,2 rbd,6 volumes,7 images,9 afs,10 .rgw,11 .rgw.gc,12 > >> .rgw.control,13 .users.uid,14 .users.email,15 .users,16 > >> .rgw.buckets,17 .usage, > >> > >> > >> On the relevant hosts, I checked what was in those directories: > >> > >> [root@lxfsrc4906 ~]# ls -l > //var/lib/ceph/osd/ceph-35/current/11.1f0_head/ -a > >> total 20 > >> drwxr-xr-x. 2 root root 6 Apr 16 10:48 . > >> drwxr-xr-x. 419 root root 12288 Apr 16 11:15 .. > >> > >> They were all empty like that. I checked the log files: > >> > >> 2013-04-18 14:53:56.532054 7fe5457fb700 0 log [ERR] : 11.0 deep-scrub > >> stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes. > >> 2013-04-18 14:53:56.532065 7fe5457fb700 0 log [ERR] : 11.0 deep-scrub > 1 errors > >> 2013-04-18 14:53:59.532401 7fe5457fb700 0 log [ERR] : 11.8 deep-scrub > >> stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes. > >> 2013-04-18 14:53:59.532411 7fe5457fb700 0 log [ERR] : 11.8 deep-scrub > 1 errors > >> 2013-04-18 14:54:01.532602 7fe5457fb700 0 log [ERR] : 11.10 > >> deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes. > >> 2013-04-18 14:54:01.532614 7fe5457fb700 0 log [ERR] : 11.10 deep-scrub > 1 errors > >> 2013-04-18 14:54:02.532839 7fe5457fb700 0 log [ERR] : 11.18 > >> deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes. > >> 2013-04-18 14:54:02.532848 7fe5457fb700 0 log [ERR] : 11.18 deep-scrub > 1 errors > >> … > >> 2013-04-18 14:57:14.554431 7fe5457fb700 0 log [ERR] : 11.1f0 > >> deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes. > >> 2013-04-18 14:57:14.554438 7fe5457fb700 0 log [ERR] : 11.1f0 > >> deep-scrub 1 errors > >> > >> So it looks like something deleted all the objects from those pg > directories. > >> Next I tried a repair: > >> > >> [root@ceph-mon1 ~]# ceph pg repair 11.1f0 > >> instructing pg 11.1f0 on osd.35 to repair > >> [root@ceph-mon1 ~]# ceph -w > >> … > >> 2013-04-18 15:19:23.676728 osd.35 [ERR] 11.1f0 repair stat mismatch, > >> got 0/3 objects, 0/0 clones, 0/0 bytes. > >> 2013-04-18 15:19:23.676783 osd.35 [ERR] 11.1f0 repair 1 errors, 1 fixed > >> [root@ceph-mon1 ~]# ceph pg deep-scrub 11.1f0 > >> instructing pg 11.1f0 on osd.35 to deep-sc -- Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
