Re: [ceph-users] ceph osd debug question / proposal

Shinobu Mon, 24 Aug 2015 20:33:07 -0700

Hope nobody never does that.
Anyway that's good to know in case of disaster recovery.


Thank you!

 Shinobu

On Tue, Aug 25, 2015 at 12:10 PM, Goncalo Borges <
[email protected]> wrote:

> Hi Shinobu
>
> Human mistake, for example :-) Not very frequent, but it happens.
>
> Nevertheless, the idea is to test ceph against different DC scenarios,
> triggered by different problems.
>
> On this particular situation, the cluster recovered ok ONCE the
> problematic OSD daemon was tagged as 'down' and 'out'
>
> Cheers
> Goncalo
>
>
> On 08/25/2015 01:06 PM, Shinobu wrote:
>
> So what is the situation where you need to do:
>
> # cd /var/lib/ceph/osd/ceph-23/current
> # rm -Rf *
> # df
> (...)
>
> I'm quite sure that is not normal.
>
>  Shinobu
>
> On Tue, Aug 25, 2015 at 9:41 AM, Goncalo Borges <
> <[email protected]>[email protected]> wrote:
>
>> Hi Jan...
>>
>> We were interested in the situation where an rm -Rf is done in the
>> current directory of the OSD. Here are my findings:
>>
>> 1. In this exercise, we simply deleted all the content of
>> /var/lib/ceph/osd/ceph-23/current.
>>
>> # cd /var/lib/ceph/osd/ceph-23/current
>> # rm -Rf *
>> # df
>> (...)
>> /dev/sdj1      2918054776    434548 2917620228   1%
>> /var/lib/ceph/osd/ceph-23
>>
>>
>>
>> 2. After some time, ceph enters in error state because it thinks it has
>> an inconsistent PG and several scrub errors
>>
>> # ceph -s
>>     cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc
>>      health HEALTH_ERR
>>             1 pgs inconsistent
>>             1850 scrub errors
>>      monmap e1: 3 mons at
>> {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0}
>>             election epoch 24, quorum 0,1,2 mon1,mon3,mon2
>>      mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay
>>      osdmap e1903: 32 osds: 32 up, 32 in
>>       pgmap v1041261: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects
>>             14424 GB used, 74627 GB / 89051 GB avail
>>                 2175 active+clean
>>                    1 active+clean+inconsistent
>>   client io 989 B/s rd, 1 op/s
>>
>>
>> 3. Looking to ceph.log in the mon, it is possible to check which is the
>> PG affected and which OSD is responsible for the error:
>>
>> # tail -f /var/log/ceph/ceph.log
>> (...)
>> 2015-08-24 11:31:10.139239 osd.13 X.X.X.X:6804/20104 2384 : cluster [ERR]
>> be_compare_scrubmaps: *5.336 shard 23* missing
>> e300336/100000001b0.00002825/head//5be_compare_scrubmaps: 5.336 shard 23
>> missing 32600336/10000000109.00000754/head//5be_compare_scrubmaps: *5.336
>> shard 23* missing
>> dd700336/100000001ab.00000b91/head//5be_compare_scrubmaps: 5.336 shard 23
>> missing bc220336/100000001bd.0000387c/head//5be_compare_scrubmaps: 5.336
>> shard 23 missing f9320336/10000000201.00002e96/head//5be_compare_scrubmaps:
>> 5.336 shard 23 missing
>> 1a920336/10000000228.0000d501/head//5be_compare_scrubmaps: 5.336 shard 23
>> missing 24a20336/100000001bc.00003e06/head//5be_compare_scrubmaps: 5.336
>> shard 23 missing cd20336/10000000227.00004775/head//5be_compare_scrubmaps:
>> 5.336 shard 23 missing
>> cef20336/100000001b9.00002260/head//5be_compare_scrubmaps: 5.336 shard 23
>> missing ba240336/100000001d8.00000630/head//5be_compare_scrubmaps: 5.336
>> shard 23 missing 3e740336/100000001b1.00002089/head//5be_compare_scrubmaps:
>> 5.336 shard 23 missing
>> e840336/100000001ba.00002618/head//5be_compare_scrubmaps: 5.336 shard 23
>> missing 17b40336/100000000e9.00000287/head//5be_compare_scrubmaps: 5.336
>> shard 23 missing b7950336/100000000e4.00000800/head//5be_compare_scrubmaps:
>> 5.336 shard 23 missing
>> 94560336/100000001b4.00002834/head//5be_compare_scrubmaps: 5.336 shard 23
>> missing 71370336/10000000051.00000179/head//5be_compare_scrubmaps: 5.336
>> shard 23 missing 62370336/100000001b5.00003b5b/head//5be_compare_scrubmaps:
>> 5.336 shard 23 missing
>> e9670336/10000000120.000003f8/head//5be_compare_scrubmaps: 5.336 shard 23
>> missing 1b480336/1000000019a.00000d4b/head//5be_compare_scrubmaps: 5.336
>> shard 23 missing 11880336/100000001e8.000003e9/head//5be_compare_scrubmaps:
>> 5.336 shard 23 missing
>> 56c80336/10000000083.00000255/head//5be_compare_scrubmaps: 5.336 shard 23
>> missing 97790336/100000001e7.00000668/head//5be_compare_scrubmaps: 5.336
>> shard 23 missing e4ca0336/100000001b6.0000278c/head//5be_compare_scrubmaps:
>> 5.336 shard 23 missing 4eda0336/1000000019e.000036ad/head//5
>> (...)
>> 2015-08-24 11:31:14.336760 osd.13 X.X.X.X:6804/20104 2476 : cluster [ERR]
>> 5.336 scrub 1850 missing, 0 inconsistent objects
>> 2015-08-24 11:31:14.336764 osd.13 X.X.X.X:6804/20104 2477 : cluster [ERR]
>> 5.336 scrub 1850 errors
>>
>> 4. We have tried to restart the problematic osd, but that fails.
>>
>> # /etc/init.d/ceph stop osd.23
>> === osd.23 ===
>> Stopping Ceph osd.23 on osd3...done
>> [root@osd3 ~]# /etc/init.d/ceph start osd.23
>> === osd.23 ===
>> create-or-move updated item name 'osd.23' weight 2.72 at location
>> {host=osd3,root=default} to crush map
>> Starting Ceph osd.23 on osd3...
>> starting osd.23 at :/0 osd_data /var/lib/ceph/osd/ceph-23
>> /var/lib/ceph/osd/ceph-23/journal
>>
>> # tail -f /var/log/ceph/ceph-osd.23.log
>> 2015-08-24 11:48:12.189322 7fa24d85d800  0 ceph version 0.94.2
>> (5fb85614ca8f354284c713a2f9c610860720bbf3), process ceph-osd, pid 7266
>> 2015-08-24 11:48:12.389747 7fa24d85d800  0
>> filestore(/var/lib/ceph/osd/ceph-23) backend xfs (magic 0x58465342)
>> 2015-08-24 11:48:12.391370 7fa24d85d800  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features: FIEMAP
>> ioctl is supported and appears to work
>> 2015-08-24 11:48:12.391381 7fa24d85d800  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features: FIEMAP
>> ioctl is disabled via 'filestore fiemap' config option
>> 2015-08-24 11:48:12.404785 7fa24d85d800  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features:
>> syscall(SYS_syncfs, fd) fully supported
>> 2015-08-24 11:48:12.404874 7fa24d85d800  0
>> xfsfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features: disabling
>> extsize, kernel 2.6.32-504.16.2.el6.x86_64 is older than 3.5 and has buggy
>> extsize ioctl
>> 2015-08-24 11:48:12.405226 7fa24d85d800 -1
>> filestore(/var/lib/ceph/osd/ceph-23) mount initial op seq is 0; something
>> is wrong
>> 2015-08-24 11:48:12.405243 7fa24d85d800 -1 osd.23 0 OSD:init: unable to
>> mount object store
>> 2015-08-24 11:48:12.405251 7fa24d85d800 -1   ERROR: osd init failed: (22)
>> Invalid argument
>>
>>
>> 5. At this point, osd.23 is reported as 'down' but 'in', and ceph finally
>> understands that there is an osd down, and that there are degraded PGs.
>>
>> # ceph osd dump
>> (...)
>> osd.23 down in  weight 1 up_from 276 up_thru 1838 down_at 1904
>> last_clean_interval [112,208) X.X.X.X:6812/30826 10.100.1.169:6812/30826
>> 10.100.1.169:6813/30826 X.X.X.X:6813/30826 exists
>> 3010de97-3080-42e2-9a64-4bb1960d40b4
>> # ceph -s
>>     cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc
>>      health HEALTH_ERR
>>             202 pgs degraded
>>             1 pgs inconsistent
>>             201 pgs stuck unclean
>>             202 pgs undersized
>>             recovery 155149/5664204 objects degraded (2.739%)
>>             1850 scrub errors
>>             1/32 in osds are down
>>      monmap e1: 3 mons at
>> {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0}
>>             election epoch 24, quorum 0,1,2 mon1,mon3,mon2
>>      mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay
>>      osdmap e1905: 32 osds: 31 up, 32 in
>>       pgmap v1041433: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects
>>             14424 GB used, 74627 GB / 89051 GB avail
>>             155149/5664204 objects degraded (2.739%)
>>                 1974 active+clean
>>                  201 active+undersized+degraded
>>                    1 active+undersized+degraded+inconsistent
>>   client io 1023 B/s rd, 0 op/s
>>
>> 6. Recovery I/O starts and finishes, but the systems remains in error
>> state because of the inconsistent PG
>>
>> # ceph -s
>>     cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc
>>      health HEALTH_ERR
>>             1 pgs inconsistent
>>             1850 scrub errors
>>      monmap e1: 3 mons at
>> {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0}
>>             election epoch 24, quorum 0,1,2 mon1,mon3,mon2
>>      mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay
>>      osdmap e2097: 32 osds: 31 up, 31 in
>>       pgmap v1043287: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects
>>             14838 GB used, 71430 GB / 86269 GB avail
>>                 2172 active+clean
>>                    2 active+clean+replay
>>                    1 active+clean+scrubbing
>>                    1 active+clean+inconsistent
>>   client io 1063 B/s rd, 2 op/s
>>
>> # ceph health detail
>> HEALTH_ERR 1 pgs inconsistent; 1850 scrub errors
>> pg 5.336 is active+clean+inconsistent, acting [13,2,22]
>> 1850 scrub errors
>>
>>
>> 7. The inconsistent PG is then repaired, and the system returns to a
>> 'HEALTH_OK' status
>>
>> # ceph pg repair 5.336
>> instructing pg 5.336 on osd.13 to repair
>>
>> $ tail -f /var/log/ceph/ceph.log
>> 2015-08-24 12:30:49.583363 mon.0 X.X.X.X:6789/0 289961 : cluster [INF]
>> pgmap v1043322: 2176 pgs: 2173 active+clean, 1 active+clean+inconsistent, 2
>> active+clean+replay; 4930 GB data, 14833 GB used, 71435 GB / 86269 GB
>> avail; 1023 B/s rd, 1 op/s
>> 2015-08-24 12:36:20.894597 osd.13 192.231.127.168:6804/20104 2496 :
>> cluster [INF] 5.336 repair starts
>> 2015-08-24 12:39:27.105511 osd.13 192.231.127.168:6804/20104 2497 :
>> cluster [INF] 5.336 repair ok, 0 fixed
>>
>> # ceph -s
>>     cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc
>>      health HEALTH_OK
>>      monmap e1: 3 mons at
>> {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0}
>>             election epoch 24, quorum 0,1,2 mon1,mon3,mon2
>>      mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay
>>      osdmap e2097: 32 osds: 31 up, 31 in
>>       pgmap v1043543: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects
>>             14828 GB used, 71440 GB / 86269 GB avail
>>                 2174 active+clean
>>                    2 active+clean+replay
>>   client io 1023 B/s rd, 1 op/s
>>
>>
>>
>>
>>
>> On 08/24/2015 07:49 PM, Jan Schermer wrote:
>>
>> I'm not talking about IO happening, I'm talking about file descriptors 
>> staying open. If they weren't open you could umount it without the "-l".
>> Once you hit the OSD again all those open files will start working and if 
>> more need to be opened it will start looking for them...
>>
>> Jan
>>
>>
>>
>> On 24 Aug 2015, at 03:07, Goncalo Borges <[email protected]> 
>> <[email protected]> wrote:
>>
>> Hi Jan...
>>
>> Thank for the reply.
>>
>> Yes, I did an 'umount -l' but I was sure that no I/O was happening at the 
>> time. So, I was almost 100% sure that there were no real incoherence in 
>> terms of open files in the OS.
>>
>>
>> On 08/20/2015 07:31 PM, Jan Schermer wrote:
>>
>> Just to clarify - you unmounted the filesystem with "umount -l"? That almost 
>> never a good idea, and it puts the OSD in a very unusual situation where IO 
>> will actually work on the open files, but it can't open any new ones. I 
>> think this would be enough to confuse just about any piece of software.
>>
>> Yes, I did an 'umount -l' but I was sure that no I/O was happening at the 
>> time. So, I was almost 100% sure that there were no real incoherence in 
>> terms of open files in the OS.
>>
>>
>> Was journal on the filesystem or on a separate partition/device?
>>
>> The journal in on the same disk, but in a different partition.
>>
>>
>> It's not the same as R/O filesystem (I hit that once and no such havoc 
>> happened), in my experience the OSD traps and exits when something like that 
>> happens.
>>
>> It would be interesting to know what would happen if you just did rm -rf 
>> /var/lib/ceph/osd/ceph-4/current/* - that could be an equivalent to umount 
>> -l, more or less :-)
>>
>>
>> Will try that today and report back here.
>>
>> Cheers
>> Goncalo
>>
>>
>> --
>> Goncalo Borges
>> Research Computing
>> ARC Centre of Excellence for Particle Physics at the Terascale
>> School of Physics A28 | University of Sydney, NSW  2006
>> T: +61 2 93511937
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> Email:
>  [email protected]
>  [email protected]
>
>  Life w/ Linux <http://i-shinobu.hatenablog.com/>
>
>
> --
> Goncalo Borges
> Research Computing
> ARC Centre of Excellence for Particle Physics at the Terascale
> School of Physics A28 | University of Sydney, NSW  2006
> T: +61 2 93511937
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Email:
 [email protected]
 [email protected]

 Life w/ Linux <http://i-shinobu.hatenablog.com/>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph osd debug question / proposal

Reply via email to