Hope nobody never does that. Anyway that's good to know in case of disaster recovery.
Thank you! Shinobu On Tue, Aug 25, 2015 at 12:10 PM, Goncalo Borges < [email protected]> wrote: > Hi Shinobu > > Human mistake, for example :-) Not very frequent, but it happens. > > Nevertheless, the idea is to test ceph against different DC scenarios, > triggered by different problems. > > On this particular situation, the cluster recovered ok ONCE the > problematic OSD daemon was tagged as 'down' and 'out' > > Cheers > Goncalo > > > On 08/25/2015 01:06 PM, Shinobu wrote: > > So what is the situation where you need to do: > > # cd /var/lib/ceph/osd/ceph-23/current > # rm -Rf * > # df > (...) > > I'm quite sure that is not normal. > > Shinobu > > On Tue, Aug 25, 2015 at 9:41 AM, Goncalo Borges < > <[email protected]>[email protected]> wrote: > >> Hi Jan... >> >> We were interested in the situation where an rm -Rf is done in the >> current directory of the OSD. Here are my findings: >> >> 1. In this exercise, we simply deleted all the content of >> /var/lib/ceph/osd/ceph-23/current. >> >> # cd /var/lib/ceph/osd/ceph-23/current >> # rm -Rf * >> # df >> (...) >> /dev/sdj1 2918054776 434548 2917620228 1% >> /var/lib/ceph/osd/ceph-23 >> >> >> >> 2. After some time, ceph enters in error state because it thinks it has >> an inconsistent PG and several scrub errors >> >> # ceph -s >> cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc >> health HEALTH_ERR >> 1 pgs inconsistent >> 1850 scrub errors >> monmap e1: 3 mons at >> {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0} >> election epoch 24, quorum 0,1,2 mon1,mon3,mon2 >> mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay >> osdmap e1903: 32 osds: 32 up, 32 in >> pgmap v1041261: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects >> 14424 GB used, 74627 GB / 89051 GB avail >> 2175 active+clean >> 1 active+clean+inconsistent >> client io 989 B/s rd, 1 op/s >> >> >> 3. Looking to ceph.log in the mon, it is possible to check which is the >> PG affected and which OSD is responsible for the error: >> >> # tail -f /var/log/ceph/ceph.log >> (...) >> 2015-08-24 11:31:10.139239 osd.13 X.X.X.X:6804/20104 2384 : cluster [ERR] >> be_compare_scrubmaps: *5.336 shard 23* missing >> e300336/100000001b0.00002825/head//5be_compare_scrubmaps: 5.336 shard 23 >> missing 32600336/10000000109.00000754/head//5be_compare_scrubmaps: *5.336 >> shard 23* missing >> dd700336/100000001ab.00000b91/head//5be_compare_scrubmaps: 5.336 shard 23 >> missing bc220336/100000001bd.0000387c/head//5be_compare_scrubmaps: 5.336 >> shard 23 missing f9320336/10000000201.00002e96/head//5be_compare_scrubmaps: >> 5.336 shard 23 missing >> 1a920336/10000000228.0000d501/head//5be_compare_scrubmaps: 5.336 shard 23 >> missing 24a20336/100000001bc.00003e06/head//5be_compare_scrubmaps: 5.336 >> shard 23 missing cd20336/10000000227.00004775/head//5be_compare_scrubmaps: >> 5.336 shard 23 missing >> cef20336/100000001b9.00002260/head//5be_compare_scrubmaps: 5.336 shard 23 >> missing ba240336/100000001d8.00000630/head//5be_compare_scrubmaps: 5.336 >> shard 23 missing 3e740336/100000001b1.00002089/head//5be_compare_scrubmaps: >> 5.336 shard 23 missing >> e840336/100000001ba.00002618/head//5be_compare_scrubmaps: 5.336 shard 23 >> missing 17b40336/100000000e9.00000287/head//5be_compare_scrubmaps: 5.336 >> shard 23 missing b7950336/100000000e4.00000800/head//5be_compare_scrubmaps: >> 5.336 shard 23 missing >> 94560336/100000001b4.00002834/head//5be_compare_scrubmaps: 5.336 shard 23 >> missing 71370336/10000000051.00000179/head//5be_compare_scrubmaps: 5.336 >> shard 23 missing 62370336/100000001b5.00003b5b/head//5be_compare_scrubmaps: >> 5.336 shard 23 missing >> e9670336/10000000120.000003f8/head//5be_compare_scrubmaps: 5.336 shard 23 >> missing 1b480336/1000000019a.00000d4b/head//5be_compare_scrubmaps: 5.336 >> shard 23 missing 11880336/100000001e8.000003e9/head//5be_compare_scrubmaps: >> 5.336 shard 23 missing >> 56c80336/10000000083.00000255/head//5be_compare_scrubmaps: 5.336 shard 23 >> missing 97790336/100000001e7.00000668/head//5be_compare_scrubmaps: 5.336 >> shard 23 missing e4ca0336/100000001b6.0000278c/head//5be_compare_scrubmaps: >> 5.336 shard 23 missing 4eda0336/1000000019e.000036ad/head//5 >> (...) >> 2015-08-24 11:31:14.336760 osd.13 X.X.X.X:6804/20104 2476 : cluster [ERR] >> 5.336 scrub 1850 missing, 0 inconsistent objects >> 2015-08-24 11:31:14.336764 osd.13 X.X.X.X:6804/20104 2477 : cluster [ERR] >> 5.336 scrub 1850 errors >> >> 4. We have tried to restart the problematic osd, but that fails. >> >> # /etc/init.d/ceph stop osd.23 >> === osd.23 === >> Stopping Ceph osd.23 on osd3...done >> [root@osd3 ~]# /etc/init.d/ceph start osd.23 >> === osd.23 === >> create-or-move updated item name 'osd.23' weight 2.72 at location >> {host=osd3,root=default} to crush map >> Starting Ceph osd.23 on osd3... >> starting osd.23 at :/0 osd_data /var/lib/ceph/osd/ceph-23 >> /var/lib/ceph/osd/ceph-23/journal >> >> # tail -f /var/log/ceph/ceph-osd.23.log >> 2015-08-24 11:48:12.189322 7fa24d85d800 0 ceph version 0.94.2 >> (5fb85614ca8f354284c713a2f9c610860720bbf3), process ceph-osd, pid 7266 >> 2015-08-24 11:48:12.389747 7fa24d85d800 0 >> filestore(/var/lib/ceph/osd/ceph-23) backend xfs (magic 0x58465342) >> 2015-08-24 11:48:12.391370 7fa24d85d800 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features: FIEMAP >> ioctl is supported and appears to work >> 2015-08-24 11:48:12.391381 7fa24d85d800 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features: FIEMAP >> ioctl is disabled via 'filestore fiemap' config option >> 2015-08-24 11:48:12.404785 7fa24d85d800 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features: >> syscall(SYS_syncfs, fd) fully supported >> 2015-08-24 11:48:12.404874 7fa24d85d800 0 >> xfsfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features: disabling >> extsize, kernel 2.6.32-504.16.2.el6.x86_64 is older than 3.5 and has buggy >> extsize ioctl >> 2015-08-24 11:48:12.405226 7fa24d85d800 -1 >> filestore(/var/lib/ceph/osd/ceph-23) mount initial op seq is 0; something >> is wrong >> 2015-08-24 11:48:12.405243 7fa24d85d800 -1 osd.23 0 OSD:init: unable to >> mount object store >> 2015-08-24 11:48:12.405251 7fa24d85d800 -1 ERROR: osd init failed: (22) >> Invalid argument >> >> >> 5. At this point, osd.23 is reported as 'down' but 'in', and ceph finally >> understands that there is an osd down, and that there are degraded PGs. >> >> # ceph osd dump >> (...) >> osd.23 down in weight 1 up_from 276 up_thru 1838 down_at 1904 >> last_clean_interval [112,208) X.X.X.X:6812/30826 10.100.1.169:6812/30826 >> 10.100.1.169:6813/30826 X.X.X.X:6813/30826 exists >> 3010de97-3080-42e2-9a64-4bb1960d40b4 >> # ceph -s >> cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc >> health HEALTH_ERR >> 202 pgs degraded >> 1 pgs inconsistent >> 201 pgs stuck unclean >> 202 pgs undersized >> recovery 155149/5664204 objects degraded (2.739%) >> 1850 scrub errors >> 1/32 in osds are down >> monmap e1: 3 mons at >> {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0} >> election epoch 24, quorum 0,1,2 mon1,mon3,mon2 >> mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay >> osdmap e1905: 32 osds: 31 up, 32 in >> pgmap v1041433: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects >> 14424 GB used, 74627 GB / 89051 GB avail >> 155149/5664204 objects degraded (2.739%) >> 1974 active+clean >> 201 active+undersized+degraded >> 1 active+undersized+degraded+inconsistent >> client io 1023 B/s rd, 0 op/s >> >> 6. Recovery I/O starts and finishes, but the systems remains in error >> state because of the inconsistent PG >> >> # ceph -s >> cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc >> health HEALTH_ERR >> 1 pgs inconsistent >> 1850 scrub errors >> monmap e1: 3 mons at >> {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0} >> election epoch 24, quorum 0,1,2 mon1,mon3,mon2 >> mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay >> osdmap e2097: 32 osds: 31 up, 31 in >> pgmap v1043287: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects >> 14838 GB used, 71430 GB / 86269 GB avail >> 2172 active+clean >> 2 active+clean+replay >> 1 active+clean+scrubbing >> 1 active+clean+inconsistent >> client io 1063 B/s rd, 2 op/s >> >> # ceph health detail >> HEALTH_ERR 1 pgs inconsistent; 1850 scrub errors >> pg 5.336 is active+clean+inconsistent, acting [13,2,22] >> 1850 scrub errors >> >> >> 7. The inconsistent PG is then repaired, and the system returns to a >> 'HEALTH_OK' status >> >> # ceph pg repair 5.336 >> instructing pg 5.336 on osd.13 to repair >> >> $ tail -f /var/log/ceph/ceph.log >> 2015-08-24 12:30:49.583363 mon.0 X.X.X.X:6789/0 289961 : cluster [INF] >> pgmap v1043322: 2176 pgs: 2173 active+clean, 1 active+clean+inconsistent, 2 >> active+clean+replay; 4930 GB data, 14833 GB used, 71435 GB / 86269 GB >> avail; 1023 B/s rd, 1 op/s >> 2015-08-24 12:36:20.894597 osd.13 192.231.127.168:6804/20104 2496 : >> cluster [INF] 5.336 repair starts >> 2015-08-24 12:39:27.105511 osd.13 192.231.127.168:6804/20104 2497 : >> cluster [INF] 5.336 repair ok, 0 fixed >> >> # ceph -s >> cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc >> health HEALTH_OK >> monmap e1: 3 mons at >> {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0} >> election epoch 24, quorum 0,1,2 mon1,mon3,mon2 >> mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay >> osdmap e2097: 32 osds: 31 up, 31 in >> pgmap v1043543: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects >> 14828 GB used, 71440 GB / 86269 GB avail >> 2174 active+clean >> 2 active+clean+replay >> client io 1023 B/s rd, 1 op/s >> >> >> >> >> >> On 08/24/2015 07:49 PM, Jan Schermer wrote: >> >> I'm not talking about IO happening, I'm talking about file descriptors >> staying open. If they weren't open you could umount it without the "-l". >> Once you hit the OSD again all those open files will start working and if >> more need to be opened it will start looking for them... >> >> Jan >> >> >> >> On 24 Aug 2015, at 03:07, Goncalo Borges <[email protected]> >> <[email protected]> wrote: >> >> Hi Jan... >> >> Thank for the reply. >> >> Yes, I did an 'umount -l' but I was sure that no I/O was happening at the >> time. So, I was almost 100% sure that there were no real incoherence in >> terms of open files in the OS. >> >> >> On 08/20/2015 07:31 PM, Jan Schermer wrote: >> >> Just to clarify - you unmounted the filesystem with "umount -l"? That almost >> never a good idea, and it puts the OSD in a very unusual situation where IO >> will actually work on the open files, but it can't open any new ones. I >> think this would be enough to confuse just about any piece of software. >> >> Yes, I did an 'umount -l' but I was sure that no I/O was happening at the >> time. So, I was almost 100% sure that there were no real incoherence in >> terms of open files in the OS. >> >> >> Was journal on the filesystem or on a separate partition/device? >> >> The journal in on the same disk, but in a different partition. >> >> >> It's not the same as R/O filesystem (I hit that once and no such havoc >> happened), in my experience the OSD traps and exits when something like that >> happens. >> >> It would be interesting to know what would happen if you just did rm -rf >> /var/lib/ceph/osd/ceph-4/current/* - that could be an equivalent to umount >> -l, more or less :-) >> >> >> Will try that today and report back here. >> >> Cheers >> Goncalo >> >> >> -- >> Goncalo Borges >> Research Computing >> ARC Centre of Excellence for Particle Physics at the Terascale >> School of Physics A28 | University of Sydney, NSW 2006 >> T: +61 2 93511937 >> >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > -- > Email: > [email protected] > [email protected] > > Life w/ Linux <http://i-shinobu.hatenablog.com/> > > > -- > Goncalo Borges > Research Computing > ARC Centre of Excellence for Particle Physics at the Terascale > School of Physics A28 | University of Sydney, NSW 2006 > T: +61 2 93511937 > > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Email: [email protected] [email protected] Life w/ Linux <http://i-shinobu.hatenablog.com/>
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
