Dear Ceph gurus...

Just wanted to report something that may be interesting to enhance... or maybe I am not doing the right debugging procedure.

1. I am working with 0.92.2 and I am testing the cluster in several disaster catastrophe scenarios.

2. I have 32 OSDs distributed in 4 servers, meaning that I have 8 OSD per server.

3. I have deliberately unmounted the filesystem of osd.4 but the daemon was left on. I just wanted to understand how the system would react. This was what happened:

   a. While there was no I/0, the system did not realized that the
   osd-4 filesystem was not mounted, and the 'ceph -s' continues to
   report HEALTH_OK for the system status.

   b. When I've started to impose some heavy I/O, the system started to
   complain of slow requests. Curiously, osd.4 never appears in the logs.

       # ceph -s
            cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc
             health HEALTH_WARN
                    170 requests are blocked > 32 sec
             monmap e1: 3 mons at
       
{rccephmon1=192.231.127.8:6789/0,rccephmon2=192.231.127.34:6789/0,rccephmon3=192.231.127.26:6789/0}
                    election epoch 24, quorum 0,1,2
       rccephmon1,rccephmon3,rccephmon2
             mdsmap e162: 1/1/1 up {0=rccephmds=up:active}, 1
       up:standby-replay
             osdmap e1179: 32 osds: 32 up, 32 in
              pgmap v907325: 2176 pgs, 2 pools, 4928 GB data, 1843 kobjects
                    14823 GB used, 74228 GB / 89051 GB avail
                        2174 active+clean
                           2 active+clean+replay

       # ceph -w
       (...)
       2015-08-19 17:44:55.161731 osd.1 [WRN] 88 slow requests, 8
       included below; oldest blocked for > 3156.325716 secs
       2015-08-19 17:44:55.161940 osd.1 [WRN] slow request 1920.533342
       seconds old, received at 2015-08-19 17:12:54.628258:
       osd_op(client.44544.1:2266980 1000000022a.00006aec [write
       524288~524288 [1@-1]] 5.e0cf740e snapc 1=[] ondisk+write e1171)
       currently waiting for replay end
       2015-08-19 17:44:55.161950 osd.1 [WRN] slow request 1920.511098
       seconds old, received at 2015-08-19 17:12:54.650502:
       osd_op(client.44544.1:2266988 1000000022a.00006aec [write
       1048576~524288 [1@-1]] 5.e0cf740e snapc 1=[] ondisk+write e1171)
       currently waiting for replay end
       2015-08-19 17:44:55.161957 osd.1 [WRN] slow request 1920.510451
       seconds old, received at 2015-08-19 17:12:54.651149:
       osd_op(client.44544.1:2266996 1000000022a.00006aec [write
       1572864~524288 [1@-1]] 5.e0cf740e snapc 1=[] ondisk+write e1171)
       currently waiting for replay end
       2015-08-19 17:44:55.161963 osd.1 [WRN] slow request 1920.488589
       seconds old, received at 2015-08-19 17:12:54.673011:
       osd_op(client.44544.1:2267004 1000000022a.00006aec [write
       2097152~524288 [1@-1]] 5.e0cf740e snapc 1=[] ondisk+write e1171)
       currently waiting for replay end
       2015-08-19 17:44:55.161970 osd.1 [WRN] slow request 1920.482785
       seconds old, received at 2015-08-19 17:12:54.678815:
       osd_op(client.44544.1:2267012 1000000022a.00006aec [write
       2621440~524288 [1@-1]] 5.e0cf740e snapc 1=[] ondisk+write e1171)
       currently waiting for replay end
       (...)
       # grep "slow requests" /tmp/osd_failed.txt  | awk '{print $3}' |
       sort | uniq
       osd.1
       osd.11
       osd.17
       osd.23
       osd.24
       osd.26
       osd.27
       osd.31
       osd.7


   c. None of the standard 'ceph osd' commands indicated that the
   problematic OSD was osd.4. Only looking to ceph-osd.4.log, we find
   write error messages:

       015-08-19 16:52:17.552512 7f6f69973700  0 --
       10.100.1.167:6809/23763 >> 10.100.1.169:6800/28352
       pipe(0x175ca000 sd=169 :6809 s=0 pgs=0 cs=0 l=0
       c=0x1f038000).accept connect_seq 180 vs existing 179 state standby
       2015-08-19 16:52:17.566701 7f6f89d2a700 -1
       filestore(/var/lib/ceph/osd/ceph-4) could not find
       e6f81180/1000000022a.00000030/head//5 in index: (2) No such file
       or directory
       2015-08-19 16:52:17.567230 7f6f89d2a700  0
       filestore(/var/lib/ceph/osd/ceph-4) write couldn't open
       5.180_head/e6f81180/1000000022a.00000030/head//5: (2) No such
       file or directory
       2015-08-19 16:52:17.567332 7f6f89d2a700 -1
       filestore(/var/lib/ceph/osd/ceph-4) could not find
       e6f81180/1000000022a.00000030/head//5 in index: (2) No such file
       or directory
       2015-08-19 16:52:17.567365 7f6f89d2a700 -1
       filestore(/var/lib/ceph/osd/ceph-4) could not find
       e6f81180/1000000022a.00000030/head//5 in index: (2) No such file
       or directory
       2015-08-19 16:52:17.570206 7f6f8a72b700 -1
       filestore(/var/lib/ceph/osd/ceph-4) could not find
       e6f81180/1000000022a.00000030/head//5 in index: (2) No such file
       or directory
       2015-08-19 16:52:17.570233 7f6f8a72b700  0
       filestore(/var/lib/ceph/osd/ceph-4) write couldn't open
       5.180_head/e6f81180/1000000022a.00000030/head//5: (2) No such
       file or directory
       2015-08-19 16:52:17.570260 7f6f8a72b700 -1
       filestore(/var/lib/ceph/osd/ceph-4) could not find
       e6f81180/1000000022a.00000030/head//5 in index: (2) No such file
       or directory
       2015-08-19 16:52:17.570603 7f6f8a72b700 -1
       filestore(/var/lib/ceph/osd/ceph-4) could not find
       e6f81180/1000000022a.00000030/head//5 in index: (2) No such file
       or directory
       2015-08-19 16:52:17.572468 7f6f89d2a700 -1
       filestore(/var/lib/ceph/osd/ceph-4) could not find
       e6f81180/1000000022a.00000030/head//5 in index: (2) No such file
       or directory
       2015-08-19 16:52:17.572497 7f6f89d2a700  0
       filestore(/var/lib/ceph/osd/ceph-4) write couldn't open
       5.180_head/e6f81180/1000000022a.00000030/head//5: (2) No such
       file or directory

   d. A restart of osd.4, remounted the osd.4 filesystem but the slow
   connection remain there. I had to restart all the osds in the same
   server as osd.4. Once that was done, ceph realized there were
   degraded PGs, and started recovery I/O.

If I am doing my debugging correctly, I see some potential problems:

- ceph was not capable to understand there was a missing or problematic file system until the client tries to access it for I/O. - Then, the source of the problem was not obvious from the warning messages ceph was throwing. In this particular situation, if I would be running hundreds of osds and if I wouldn't know in advance which of them was failing, it would be difficult to identify the problematic one.

Maybe there is a more efficient way to perform this kind of debug? If not, can't an additional check show that the daemon can not properly access the filesystem? For example, I can imagine situation where the filesystem may be in read-only, for example.

TIA
Goncalo

--
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to