Re: [ceph-users] ceph osd debug question / proposal

Shinobu Mon, 24 Aug 2015 20:08:09 -0700

So what is the situation where you need to do:

# cd /var/lib/ceph/osd/ceph-23/current
# rm -Rf *
# df
(...)


I'm quite sure that is not normal.

 Shinobu

On Tue, Aug 25, 2015 at 9:41 AM, Goncalo Borges <[email protected]
> wrote:

> Hi Jan...
>
> We were interested in the situation where an rm -Rf is done in the current
> directory of the OSD. Here are my findings:
>
> 1. In this exercise, we simply deleted all the content of
> /var/lib/ceph/osd/ceph-23/current.
>
> # cd /var/lib/ceph/osd/ceph-23/current
> # rm -Rf *
> # df
> (...)
> /dev/sdj1      2918054776    434548 2917620228   1%
> /var/lib/ceph/osd/ceph-23
>
>
>
> 2. After some time, ceph enters in error state because it thinks it has an
> inconsistent PG and several scrub errors
>
> # ceph -s
>     cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc
>      health HEALTH_ERR
>             1 pgs inconsistent
>             1850 scrub errors
>      monmap e1: 3 mons at
> {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0}
>             election epoch 24, quorum 0,1,2 mon1,mon3,mon2
>      mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay
>      osdmap e1903: 32 osds: 32 up, 32 in
>       pgmap v1041261: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects
>             14424 GB used, 74627 GB / 89051 GB avail
>                 2175 active+clean
>                    1 active+clean+inconsistent
>   client io 989 B/s rd, 1 op/s
>
>
> 3. Looking to ceph.log in the mon, it is possible to check which is the PG
> affected and which OSD is responsible for the error:
>
> # tail -f /var/log/ceph/ceph.log
> (...)
> 2015-08-24 11:31:10.139239 osd.13 X.X.X.X:6804/20104 2384 : cluster [ERR]
> be_compare_scrubmaps: *5.336 shard 23* missing
> e300336/100000001b0.00002825/head//5be_compare_scrubmaps: 5.336 shard 23
> missing 32600336/10000000109.00000754/head//5be_compare_scrubmaps: *5.336
> shard 23* missing
> dd700336/100000001ab.00000b91/head//5be_compare_scrubmaps: 5.336 shard 23
> missing bc220336/100000001bd.0000387c/head//5be_compare_scrubmaps: 5.336
> shard 23 missing f9320336/10000000201.00002e96/head//5be_compare_scrubmaps:
> 5.336 shard 23 missing
> 1a920336/10000000228.0000d501/head//5be_compare_scrubmaps: 5.336 shard 23
> missing 24a20336/100000001bc.00003e06/head//5be_compare_scrubmaps: 5.336
> shard 23 missing cd20336/10000000227.00004775/head//5be_compare_scrubmaps:
> 5.336 shard 23 missing
> cef20336/100000001b9.00002260/head//5be_compare_scrubmaps: 5.336 shard 23
> missing ba240336/100000001d8.00000630/head//5be_compare_scrubmaps: 5.336
> shard 23 missing 3e740336/100000001b1.00002089/head//5be_compare_scrubmaps:
> 5.336 shard 23 missing
> e840336/100000001ba.00002618/head//5be_compare_scrubmaps: 5.336 shard 23
> missing 17b40336/100000000e9.00000287/head//5be_compare_scrubmaps: 5.336
> shard 23 missing b7950336/100000000e4.00000800/head//5be_compare_scrubmaps:
> 5.336 shard 23 missing
> 94560336/100000001b4.00002834/head//5be_compare_scrubmaps: 5.336 shard 23
> missing 71370336/10000000051.00000179/head//5be_compare_scrubmaps: 5.336
> shard 23 missing 62370336/100000001b5.00003b5b/head//5be_compare_scrubmaps:
> 5.336 shard 23 missing
> e9670336/10000000120.000003f8/head//5be_compare_scrubmaps: 5.336 shard 23
> missing 1b480336/1000000019a.00000d4b/head//5be_compare_scrubmaps: 5.336
> shard 23 missing 11880336/100000001e8.000003e9/head//5be_compare_scrubmaps:
> 5.336 shard 23 missing
> 56c80336/10000000083.00000255/head//5be_compare_scrubmaps: 5.336 shard 23
> missing 97790336/100000001e7.00000668/head//5be_compare_scrubmaps: 5.336
> shard 23 missing e4ca0336/100000001b6.0000278c/head//5be_compare_scrubmaps:
> 5.336 shard 23 missing 4eda0336/1000000019e.000036ad/head//5
> (...)
> 2015-08-24 11:31:14.336760 osd.13 X.X.X.X:6804/20104 2476 : cluster [ERR]
> 5.336 scrub 1850 missing, 0 inconsistent objects
> 2015-08-24 11:31:14.336764 osd.13 X.X.X.X:6804/20104 2477 : cluster [ERR]
> 5.336 scrub 1850 errors
>
> 4. We have tried to restart the problematic osd, but that fails.
>
> # /etc/init.d/ceph stop osd.23
> === osd.23 ===
> Stopping Ceph osd.23 on osd3...done
> [root@osd3 ~]# /etc/init.d/ceph start osd.23
> === osd.23 ===
> create-or-move updated item name 'osd.23' weight 2.72 at location
> {host=osd3,root=default} to crush map
> Starting Ceph osd.23 on osd3...
> starting osd.23 at :/0 osd_data /var/lib/ceph/osd/ceph-23
> /var/lib/ceph/osd/ceph-23/journal
>
> # tail -f /var/log/ceph/ceph-osd.23.log
> 2015-08-24 11:48:12.189322 7fa24d85d800  0 ceph version 0.94.2
> (5fb85614ca8f354284c713a2f9c610860720bbf3), process ceph-osd, pid 7266
> 2015-08-24 11:48:12.389747 7fa24d85d800  0
> filestore(/var/lib/ceph/osd/ceph-23) backend xfs (magic 0x58465342)
> 2015-08-24 11:48:12.391370 7fa24d85d800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features: FIEMAP
> ioctl is supported and appears to work
> 2015-08-24 11:48:12.391381 7fa24d85d800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config option
> 2015-08-24 11:48:12.404785 7fa24d85d800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features:
> syscall(SYS_syncfs, fd) fully supported
> 2015-08-24 11:48:12.404874 7fa24d85d800  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-23) detect_features: disabling
> extsize, kernel 2.6.32-504.16.2.el6.x86_64 is older than 3.5 and has buggy
> extsize ioctl
> 2015-08-24 11:48:12.405226 7fa24d85d800 -1
> filestore(/var/lib/ceph/osd/ceph-23) mount initial op seq is 0; something
> is wrong
> 2015-08-24 11:48:12.405243 7fa24d85d800 -1 osd.23 0 OSD:init: unable to
> mount object store
> 2015-08-24 11:48:12.405251 7fa24d85d800 -1   ERROR: osd init failed: (22)
> Invalid argument
>
>
> 5. At this point, osd.23 is reported as 'down' but 'in', and ceph finally
> understands that there is an osd down, and that there are degraded PGs.
>
> # ceph osd dump
> (...)
> osd.23 down in  weight 1 up_from 276 up_thru 1838 down_at 1904
> last_clean_interval [112,208) X.X.X.X:6812/30826 10.100.1.169:6812/30826
> 10.100.1.169:6813/30826 X.X.X.X:6813/30826 exists
> 3010de97-3080-42e2-9a64-4bb1960d40b4
> # ceph -s
>     cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc
>      health HEALTH_ERR
>             202 pgs degraded
>             1 pgs inconsistent
>             201 pgs stuck unclean
>             202 pgs undersized
>             recovery 155149/5664204 objects degraded (2.739%)
>             1850 scrub errors
>             1/32 in osds are down
>      monmap e1: 3 mons at
> {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0}
>             election epoch 24, quorum 0,1,2 mon1,mon3,mon2
>      mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay
>      osdmap e1905: 32 osds: 31 up, 32 in
>       pgmap v1041433: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects
>             14424 GB used, 74627 GB / 89051 GB avail
>             155149/5664204 objects degraded (2.739%)
>                 1974 active+clean
>                  201 active+undersized+degraded
>                    1 active+undersized+degraded+inconsistent
>   client io 1023 B/s rd, 0 op/s
>
> 6. Recovery I/O starts and finishes, but the systems remains in error
> state because of the inconsistent PG
>
> # ceph -s
>     cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc
>      health HEALTH_ERR
>             1 pgs inconsistent
>             1850 scrub errors
>      monmap e1: 3 mons at
> {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0}
>             election epoch 24, quorum 0,1,2 mon1,mon3,mon2
>      mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay
>      osdmap e2097: 32 osds: 31 up, 31 in
>       pgmap v1043287: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects
>             14838 GB used, 71430 GB / 86269 GB avail
>                 2172 active+clean
>                    2 active+clean+replay
>                    1 active+clean+scrubbing
>                    1 active+clean+inconsistent
>   client io 1063 B/s rd, 2 op/s
>
> # ceph health detail
> HEALTH_ERR 1 pgs inconsistent; 1850 scrub errors
> pg 5.336 is active+clean+inconsistent, acting [13,2,22]
> 1850 scrub errors
>
>
> 7. The inconsistent PG is then repaired, and the system returns to a
> 'HEALTH_OK' status
>
> # ceph pg repair 5.336
> instructing pg 5.336 on osd.13 to repair
>
> $ tail -f /var/log/ceph/ceph.log
> 2015-08-24 12:30:49.583363 mon.0 X.X.X.X:6789/0 289961 : cluster [INF]
> pgmap v1043322: 2176 pgs: 2173 active+clean, 1 active+clean+inconsistent, 2
> active+clean+replay; 4930 GB data, 14833 GB used, 71435 GB / 86269 GB
> avail; 1023 B/s rd, 1 op/s
> 2015-08-24 12:36:20.894597 osd.13 192.231.127.168:6804/20104 2496 :
> cluster [INF] 5.336 repair starts
> 2015-08-24 12:39:27.105511 osd.13 192.231.127.168:6804/20104 2497 :
> cluster [INF] 5.336 repair ok, 0 fixed
>
> # ceph -s
>     cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc
>      health HEALTH_OK
>      monmap e1: 3 mons at
> {mon1=X.X.X.X:6789/0,mon2=X.X.X.X:6789/0,mon3=X.X.X.X:6789/0}
>             election epoch 24, quorum 0,1,2 mon1,mon3,mon2
>      mdsmap e162: 1/1/1 up {0=mds=up:active}, 1 up:standby-replay
>      osdmap e2097: 32 osds: 31 up, 31 in
>       pgmap v1043543: 2176 pgs, 2 pools, 4930 GB data, 1843 kobjects
>             14828 GB used, 71440 GB / 86269 GB avail
>                 2174 active+clean
>                    2 active+clean+replay
>   client io 1023 B/s rd, 1 op/s
>
>
>
>
>
> On 08/24/2015 07:49 PM, Jan Schermer wrote:
>
> I'm not talking about IO happening, I'm talking about file descriptors 
> staying open. If they weren't open you could umount it without the "-l".
> Once you hit the OSD again all those open files will start working and if 
> more need to be opened it will start looking for them...
>
> Jan
>
>
>
> On 24 Aug 2015, at 03:07, Goncalo Borges <[email protected]> 
> <[email protected]> wrote:
>
> Hi Jan...
>
> Thank for the reply.
>
> Yes, I did an 'umount -l' but I was sure that no I/O was happening at the 
> time. So, I was almost 100% sure that there were no real incoherence in terms 
> of open files in the OS.
>
>
> On 08/20/2015 07:31 PM, Jan Schermer wrote:
>
> Just to clarify - you unmounted the filesystem with "umount -l"? That almost 
> never a good idea, and it puts the OSD in a very unusual situation where IO 
> will actually work on the open files, but it can't open any new ones. I think 
> this would be enough to confuse just about any piece of software.
>
> Yes, I did an 'umount -l' but I was sure that no I/O was happening at the 
> time. So, I was almost 100% sure that there were no real incoherence in terms 
> of open files in the OS.
>
>
> Was journal on the filesystem or on a separate partition/device?
>
> The journal in on the same disk, but in a different partition.
>
>
> It's not the same as R/O filesystem (I hit that once and no such havoc 
> happened), in my experience the OSD traps and exits when something like that 
> happens.
>
> It would be interesting to know what would happen if you just did rm -rf 
> /var/lib/ceph/osd/ceph-4/current/* - that could be an equivalent to umount 
> -l, more or less :-)
>
>
> Will try that today and report back here.
>
> Cheers
> Goncalo
>
>
> --
> Goncalo Borges
> Research Computing
> ARC Centre of Excellence for Particle Physics at the Terascale
> School of Physics A28 | University of Sydney, NSW  2006
> T: +61 2 93511937
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Email:
 [email protected]
 [email protected]

 Life w/ Linux <http://i-shinobu.hatenablog.com/>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph osd debug question / proposal

Reply via email to