So what is the situation where you need to do: # cd /var/lib/ceph/osd/ceph-23/current # rm -Rf * # df (...)
I'm quite sure that is not normal. Shinobu On Tue, Aug 25, 2015 at 9:41 AM, Goncalo Borges <[email protected] > wrote: > Hi Jan... > > We were interested in the situation where an rm -Rf is done in the current > directory of the OSD. Here are my findings: > > 1. In this exercise, we simply deleted all the content of > /var/lib/ceph/osd/ceph-23/current. > > # cd /var/lib/ceph/osd/ceph-23/current > # rm -Rf * > # df > (...) > /dev/sdj1 2918054776 434548 2917620228 1% > /var/lib/ceph/osd/ceph-23 > > > > 2. After some time, ceph enters in error state because it thinks it has an > inconsistent PG and several scrub errors > > # ceph -s > cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc > health HEALTH_ERR > 1 pgs inconsistent > 1850 scrub errors > monmap e1: 3 mons at > {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0} > election epoch 24, quorum 0,1,2 mon1,mon3,mon2 > mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay > osdmap e1903: 32 osds: 32 up, 32 in > pgmap v1041261: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects > 14424 GB used, 74627 GB / 89051 GB avail > 2175 active+clean > 1 active+clean+inconsistent > client io 989 B/s rd, 1 op/s > > > 3. Looking to ceph.log in the mon, it is possible to check which is the PG > affected and which OSD is responsible for the error: > > # tail -f /var/log/ceph/ceph.log > (...) > 2015-08-24 11:31:10.139239 osd.13 X.X.X.X:6804/20104 2384 : cluster [ERR] > be_compare_scrubmaps: *5.336 shard 23* missing > e300336/100000001b0.00002825/head//5be_compare_scrubmaps: 5.336 shard 23 > missing 32600336/10000000109.00000754/head//5be_compare_scrubmaps: *5.336 > shard 23* missing > dd700336/100000001ab.00000b91/head//5be_compare_scrubmaps: 5.336 shard 23 > missing bc220336/100000001bd.0000387c/head//5be_compare_scrubmaps: 5.336 > shard 23 missing f9320336/10000000201.00002e96/head//5be_compare_scrubmaps: > 5.336 shard 23 missing > 1a920336/10000000228.0000d501/head//5be_compare_scrubmaps: 5.336 shard 23 > missing 24a20336/100000001bc.00003e06/head//5be_compare_scrubmaps: 5.336 > shard 23 missing cd20336/10000000227.00004775/head//5be_compare_scrubmaps: > 5.336 shard 23 missing > cef20336/100000001b9.00002260/head//5be_compare_scrubmaps: 5.336 shard 23 > missing ba240336/100000001d8.00000630/head//5be_compare_scrubmaps: 5.336 > shard 23 missing 3e740336/100000001b1.00002089/head//5be_compare_scrubmaps: > 5.336 shard 23 missing > e840336/100000001ba.00002618/head//5be_compare_scrubmaps: 5.336 shard 23 > missing 17b40336/100000000e9.00000287/head//5be_compare_scrubmaps: 5.336 > shard 23 missing b7950336/100000000e4.00000800/head//5be_compare_scrubmaps: > 5.336 shard 23 missing > 94560336/100000001b4.00002834/head//5be_compare_scrubmaps: 5.336 shard 23 > missing 71370336/10000000051.00000179/head//5be_compare_scrubmaps: 5.336 > shard 23 missing 62370336/100000001b5.00003b5b/head//5be_compare_scrubmaps: > 5.336 shard 23 missing > e9670336/10000000120.000003f8/head//5be_compare_scrubmaps: 5.336 shard 23 > missing 1b480336/1000000019a.00000d4b/head//5be_compare_scrubmaps: 5.336 > shard 23 missing 11880336/100000001e8.000003e9/head//5be_compare_scrubmaps: > 5.336 shard 23 missing > 56c80336/10000000083.00000255/head//5be_compare_scrubmaps: 5.336 shard 23 > missing 97790336/100000001e7.00000668/head//5be_compare_scrubmaps: 5.336 > shard 23 missing e4ca0336/100000001b6.0000278c/head//5be_compare_scrubmaps: > 5.336 shard 23 missing 4eda0336/1000000019e.000036ad/head//5 > (...) > 2015-08-24 11:31:14.336760 osd.13 X.X.X.X:6804/20104 2476 : cluster [ERR] > 5.336 scrub 1850 missing, 0 inconsistent objects > 2015-08-24 11:31:14.336764 osd.13 X.X.X.X:6804/20104 2477 : cluster [ERR] > 5.336 scrub 1850 errors > > 4. We have tried to restart the problematic osd, but that fails. > > # /etc/init.d/ceph stop osd.23 > === osd.23 === > Stopping Ceph osd.23 on osd3...done > [root@osd3 ~]# /etc/init.d/ceph start osd.23 > === osd.23 === > create-or-move updated item name 'osd.23' weight 2.72 at location > {host=osd3,root=default} to crush map > Starting Ceph osd.23 on osd3... > starting osd.23 at :/0 osd_data /var/lib/ceph/osd/ceph-23 > /var/lib/ceph/osd/ceph-23/journal > > # tail -f /var/log/ceph/ceph-osd.23.log > 2015-08-24 11:48:12.189322 7fa24d85d800 0 ceph version 0.94.2 > (5fb85614ca8f354284c713a2f9c610860720bbf3), process ceph-osd, pid 7266 > 2015-08-24 11:48:12.389747 7fa24d85d800 0 > filestore(/var/lib/ceph/osd/ceph-23) backend xfs (magic 0x58465342) > 2015-08-24 11:48:12.391370 7fa24d85d800 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features: FIEMAP > ioctl is supported and appears to work > 2015-08-24 11:48:12.391381 7fa24d85d800 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features: FIEMAP > ioctl is disabled via 'filestore fiemap' config option > 2015-08-24 11:48:12.404785 7fa24d85d800 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features: > syscall(SYS_syncfs, fd) fully supported > 2015-08-24 11:48:12.404874 7fa24d85d800 0 > xfsfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features: disabling > extsize, kernel 2.6.32-504.16.2.el6.x86_64 is older than 3.5 and has buggy > extsize ioctl > 2015-08-24 11:48:12.405226 7fa24d85d800 -1 > filestore(/var/lib/ceph/osd/ceph-23) mount initial op seq is 0; something > is wrong > 2015-08-24 11:48:12.405243 7fa24d85d800 -1 osd.23 0 OSD:init: unable to > mount object store > 2015-08-24 11:48:12.405251 7fa24d85d800 -1 ERROR: osd init failed: (22) > Invalid argument > > > 5. At this point, osd.23 is reported as 'down' but 'in', and ceph finally > understands that there is an osd down, and that there are degraded PGs. > > # ceph osd dump > (...) > osd.23 down in weight 1 up_from 276 up_thru 1838 down_at 1904 > last_clean_interval [112,208) X.X.X.X:6812/30826 10.100.1.169:6812/30826 > 10.100.1.169:6813/30826 X.X.X.X:6813/30826 exists > 3010de97-3080-42e2-9a64-4bb1960d40b4 > # ceph -s > cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc > health HEALTH_ERR > 202 pgs degraded > 1 pgs inconsistent > 201 pgs stuck unclean > 202 pgs undersized > recovery 155149/5664204 objects degraded (2.739%) > 1850 scrub errors > 1/32 in osds are down > monmap e1: 3 mons at > {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0} > election epoch 24, quorum 0,1,2 mon1,mon3,mon2 > mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay > osdmap e1905: 32 osds: 31 up, 32 in > pgmap v1041433: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects > 14424 GB used, 74627 GB / 89051 GB avail > 155149/5664204 objects degraded (2.739%) > 1974 active+clean > 201 active+undersized+degraded > 1 active+undersized+degraded+inconsistent > client io 1023 B/s rd, 0 op/s > > 6. Recovery I/O starts and finishes, but the systems remains in error > state because of the inconsistent PG > > # ceph -s > cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc > health HEALTH_ERR > 1 pgs inconsistent > 1850 scrub errors > monmap e1: 3 mons at > {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0} > election epoch 24, quorum 0,1,2 mon1,mon3,mon2 > mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay > osdmap e2097: 32 osds: 31 up, 31 in > pgmap v1043287: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects > 14838 GB used, 71430 GB / 86269 GB avail > 2172 active+clean > 2 active+clean+replay > 1 active+clean+scrubbing > 1 active+clean+inconsistent > client io 1063 B/s rd, 2 op/s > > # ceph health detail > HEALTH_ERR 1 pgs inconsistent; 1850 scrub errors > pg 5.336 is active+clean+inconsistent, acting [13,2,22] > 1850 scrub errors > > > 7. The inconsistent PG is then repaired, and the system returns to a > 'HEALTH_OK' status > > # ceph pg repair 5.336 > instructing pg 5.336 on osd.13 to repair > > $ tail -f /var/log/ceph/ceph.log > 2015-08-24 12:30:49.583363 mon.0 X.X.X.X:6789/0 289961 : cluster [INF] > pgmap v1043322: 2176 pgs: 2173 active+clean, 1 active+clean+inconsistent, 2 > active+clean+replay; 4930 GB data, 14833 GB used, 71435 GB / 86269 GB > avail; 1023 B/s rd, 1 op/s > 2015-08-24 12:36:20.894597 osd.13 192.231.127.168:6804/20104 2496 : > cluster [INF] 5.336 repair starts > 2015-08-24 12:39:27.105511 osd.13 192.231.127.168:6804/20104 2497 : > cluster [INF] 5.336 repair ok, 0 fixed > > # ceph -s > cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc > health HEALTH_OK > monmap e1: 3 mons at > {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0} > election epoch 24, quorum 0,1,2 mon1,mon3,mon2 > mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay > osdmap e2097: 32 osds: 31 up, 31 in > pgmap v1043543: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects > 14828 GB used, 71440 GB / 86269 GB avail > 2174 active+clean > 2 active+clean+replay > client io 1023 B/s rd, 1 op/s > > > > > > On 08/24/2015 07:49 PM, Jan Schermer wrote: > > I'm not talking about IO happening, I'm talking about file descriptors > staying open. If they weren't open you could umount it without the "-l". > Once you hit the OSD again all those open files will start working and if > more need to be opened it will start looking for them... > > Jan > > > > On 24 Aug 2015, at 03:07, Goncalo Borges <[email protected]> > <[email protected]> wrote: > > Hi Jan... > > Thank for the reply. > > Yes, I did an 'umount -l' but I was sure that no I/O was happening at the > time. So, I was almost 100% sure that there were no real incoherence in terms > of open files in the OS. > > > On 08/20/2015 07:31 PM, Jan Schermer wrote: > > Just to clarify - you unmounted the filesystem with "umount -l"? That almost > never a good idea, and it puts the OSD in a very unusual situation where IO > will actually work on the open files, but it can't open any new ones. I think > this would be enough to confuse just about any piece of software. > > Yes, I did an 'umount -l' but I was sure that no I/O was happening at the > time. So, I was almost 100% sure that there were no real incoherence in terms > of open files in the OS. > > > Was journal on the filesystem or on a separate partition/device? > > The journal in on the same disk, but in a different partition. > > > It's not the same as R/O filesystem (I hit that once and no such havoc > happened), in my experience the OSD traps and exits when something like that > happens. > > It would be interesting to know what would happen if you just did rm -rf > /var/lib/ceph/osd/ceph-4/current/* - that could be an equivalent to umount > -l, more or less :-) > > > Will try that today and report back here. > > Cheers > Goncalo > > > -- > Goncalo Borges > Research Computing > ARC Centre of Excellence for Particle Physics at the Terascale > School of Physics A28 | University of Sydney, NSW 2006 > T: +61 2 93511937 > > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Email: [email protected] [email protected] Life w/ Linux <http://i-shinobu.hatenablog.com/>
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
