Just to clarify - you unmounted the filesystem with "umount -l"? That almost 
never a good idea, and it puts the OSD in a very unusual situation where IO 
will actually work on the open files, but it can't open any new ones. I think 
this would be enough to confuse just about any piece of software.
Was journal on the filesystem or on a separate partition/device?

It's not the same as R/O filesystem (I hit that once and no such havoc 
happened), in my experience the OSD traps and exits when something like that 
happens.

It would be interesting to know what would happen if you just did rm -rf 
/var/lib/ceph/osd/ceph-4/current/* - that could be an equivalent to umount -l, 
more or less :-)

Jan



> On 20 Aug 2015, at 08:01, Goncalo Borges <[email protected]> wrote:
> 
> Dear Ceph gurus...
> 
> Just wanted to report something that may be interesting to enhance... or 
> maybe I am not doing the right debugging procedure. 
> 
> 1. I am working with 0.92.2 and I am testing the cluster in several disaster 
> catastrophe scenarios.
> 
> 2. I have 32 OSDs distributed in 4 servers, meaning that I have 8 OSD per 
> server.
> 
> 3. I have deliberately unmounted  the filesystem of osd.4 but the daemon was 
> left on. I just wanted to understand how the system would react. This was 
> what happened:
> a. While there was no I/0, the system did not realized that the osd-4 
> filesystem was not mounted, and the 'ceph -s' continues to report HEALTH_OK 
> for the system status. 
> 
> b. When I've started to impose some heavy I/O, the system started to complain 
> of slow requests. Curiously, osd.4 never appears in the logs.
> # ceph -s
>     cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc
>      health HEALTH_WARN
>             170 requests are blocked > 32 sec
>      monmap e1: 3 mons at 
> {rccephmon1=192.231.127.8:6789/0,rccephmon2=192.231.127.34:6789/0,rccephmon3=192.231.127.26:6789/0}
>             election epoch 24, quorum 0,1,2 rccephmon1,rccephmon3,rccephmon2
>      mdsmap e162: 1/1/1 up {0=rccephmds=up:active}, 1 up:standby-replay
>      osdmap e1179: 32 osds: 32 up, 32 in
>       pgmap v907325: 2176 pgs, 2 pools, 4928 GB data, 1843 kobjects
>             14823 GB used, 74228 GB / 89051 GB avail
>                 2174 active+clean
>                    2 active+clean+replay
> 
> # ceph -w
> (...)          
> 2015-08-19 17:44:55.161731 osd.1 [WRN] 88 slow requests, 8 included below; 
> oldest blocked for > 3156.325716 secs
> 2015-08-19 17:44:55.161940 osd.1 [WRN] slow request 1920.533342 seconds old, 
> received at 2015-08-19 17:12:54.628258:         osd_op(client.44544.1:2266980 
> 1000000022a.00006aec [write 524288~524288 [1@-1]] 5.e0cf740e snapc 1=[] 
> ondisk+write e1171) currently waiting for replay end
> 2015-08-19 17:44:55.161950 osd.1 [WRN] slow request 1920.511098 seconds old, 
> received at 2015-08-19 17:12:54.650502:         osd_op(client.44544.1:2266988 
> 1000000022a.00006aec [write 1048576~524288 [1@-1]] 5.e0cf740e snapc 1=[] 
> ondisk+write e1171) currently waiting for replay end
> 2015-08-19 17:44:55.161957 osd.1 [WRN] slow request 1920.510451 seconds old, 
> received at 2015-08-19 17:12:54.651149:         osd_op(client.44544.1:2266996 
> 1000000022a.00006aec [write 1572864~524288 [1@-1]] 5.e0cf740e snapc 1=[] 
> ondisk+write e1171) currently waiting for replay end
> 2015-08-19 17:44:55.161963 osd.1 [WRN] slow request 1920.488589 seconds old, 
> received at 2015-08-19 17:12:54.673011:         osd_op(client.44544.1:2267004 
> 1000000022a.00006aec [write 2097152~524288 [1@-1]] 5.e0cf740e snapc 1=[] 
> ondisk+write e1171) currently waiting for replay end
> 2015-08-19 17:44:55.161970 osd.1 [WRN] slow request 1920.482785 seconds old, 
> received at 2015-08-19 17:12:54.678815:         osd_op(client.44544.1:2267012 
> 1000000022a.00006aec [write 2621440~524288 [1@-1]] 5.e0cf740e snapc 1=[] 
> ondisk+write e1171) currently waiting for replay end
> (...)                 
> # grep "slow requests" /tmp/osd_failed.txt  | awk '{print $3}' | sort | uniq
> osd.1
> osd.11
> osd.17
> osd.23
> osd.24
> osd.26
> osd.27
> osd.31
> osd.7
> 
> c. None of the standard 'ceph osd' commands indicated that the problematic 
> OSD was osd.4. Only looking to ceph-osd.4.log, we find write error messages:
> 015-08-19 16:52:17.552512 7f6f69973700  0 -- 10.100.1.167:6809/23763 >> 
> 10.100.1.169:6800/28352 pipe(0x175ca000 sd=169 :6809 s=0 pgs=0 cs=0 l=0 
> c=0x1f038000).accept connect_seq 180 vs existing 179 state standby
> 2015-08-19 16:52:17.566701 7f6f89d2a700 -1 
> filestore(/var/lib/ceph/osd/ceph-4) could not find 
> e6f81180/1000000022a.00000030/head//5 in index: (2) No such file or directory
> 2015-08-19 16:52:17.567230 7f6f89d2a700  0 
> filestore(/var/lib/ceph/osd/ceph-4) write couldn't open 
> 5.180_head/e6f81180/1000000022a.00000030/head//5: (2) No such file or 
> directory
> 2015-08-19 16:52:17.567332 7f6f89d2a700 -1 
> filestore(/var/lib/ceph/osd/ceph-4) could not find 
> e6f81180/1000000022a.00000030/head//5 in index: (2) No such file or directory
> 2015-08-19 16:52:17.567365 7f6f89d2a700 -1 
> filestore(/var/lib/ceph/osd/ceph-4) could not find 
> e6f81180/1000000022a.00000030/head//5 in index: (2) No such file or directory
> 2015-08-19 16:52:17.570206 7f6f8a72b700 -1 
> filestore(/var/lib/ceph/osd/ceph-4) could not find 
> e6f81180/1000000022a.00000030/head//5 in index: (2) No such file or directory
> 2015-08-19 16:52:17.570233 7f6f8a72b700  0 
> filestore(/var/lib/ceph/osd/ceph-4) write couldn't open 
> 5.180_head/e6f81180/1000000022a.00000030/head//5: (2) No such file or 
> directory
> 2015-08-19 16:52:17.570260 7f6f8a72b700 -1 
> filestore(/var/lib/ceph/osd/ceph-4) could not find 
> e6f81180/1000000022a.00000030/head//5 in index: (2) No such file or directory
> 2015-08-19 16:52:17.570603 7f6f8a72b700 -1 
> filestore(/var/lib/ceph/osd/ceph-4) could not find 
> e6f81180/1000000022a.00000030/head//5 in index: (2) No such file or directory
> 2015-08-19 16:52:17.572468 7f6f89d2a700 -1 
> filestore(/var/lib/ceph/osd/ceph-4) could not find 
> e6f81180/1000000022a.00000030/head//5 in index: (2) No such file or directory
> 2015-08-19 16:52:17.572497 7f6f89d2a700  0 
> filestore(/var/lib/ceph/osd/ceph-4) write couldn't open 
> 5.180_head/e6f81180/1000000022a.00000030/head//5: (2) No such file or 
> directory
> 
> d. A restart of osd.4, remounted the osd.4 filesystem but the slow connection 
> remain there. I had to restart all the osds in the same server as osd.4. Once 
> that was done, ceph realized there were degraded PGs, and started recovery 
> I/O.
> 
> If I am doing my debugging correctly, I see some potential problems:
> 
> - ceph was not capable to understand there was a missing or problematic file 
> system until the client tries to access it for I/O.
> - Then, the source of the problem was not obvious from the warning messages 
> ceph was throwing. In this particular situation, if I would be running 
> hundreds of osds and if I wouldn't know in advance which of them was failing, 
> it would be difficult to identify the problematic one. 
> 
> Maybe there is a more efficient way to perform this kind of debug? If not, 
> can't an additional check show that the daemon can not properly access the 
> filesystem? For example, I can imagine situation where the filesystem may be 
> in read-only, for example.
> 
> TIA
> Goncalo
> 
> -- 
> Goncalo Borges
> Research Computing
> ARC Centre of Excellence for Particle Physics at the Terascale
> School of Physics A28 | University of Sydney, NSW  2006
> T: +61 2 93511937
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to