On 12/10/2014 17:48, Gregory Farnum wrote: > On Sun, Oct 12, 2014 at 7:46 AM, Loic Dachary <[email protected]> wrote: >> Hi, >> >> On a 0.80.6 cluster the command >> >> ceph tell osd.6 version >> >> hangs forever. I checked that it establishes a TCP connection to the OSD, >> raised the OSD debug level to 20 and I do not see >> >> https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L4991 >> >> in the logs. All other OSDs answer to the same "version" command as they >> should. And ceph daemon osd.6 version on the machine running OSD 6 responds >> as it should. There also are an ever growing number of slow requests on this >> OSD. But not error in the logs. In other words, except for taking forever to >> answer any kind of request the OSD looks fine. >> >> Another OSD running on the same machine is behaving well. >> >> Any idea what that behaviour relates to ? > > What commands have you run? The admin socket commands don't require > nearly as many locks, nor do they go through the same event loops that > messages do. You might have found a deadlock or something. (In which > case just restarting the OSD would probably fix it, but you should > grab a core dump first.)
# /etc/init.d/ceph stop osd.6
=== osd.6 ===
Stopping Ceph osd.6 on g3...kill 23690...kill 23690...done
root@g3:/var/lib/ceph/osd/ceph-6/current# /etc/init.d/ceph start osd.6
=== osd.6 ===
Starting Ceph osd.6 on g3...
starting osd.6 at :/0 osd_data /var/lib/ceph/osd/ceph-6
/var/lib/ceph/osd/ceph-6/journal
root@g3:/var/lib/ceph/osd/ceph-6/current# ceph tell osd.6 version
{ "version": "ceph version 0.80.6 (f93610a4421cb670b08e974c6550ee715ac528ae)"}
root@g3:/var/lib/ceph/osd/ceph-6/current# ceph tell osd.6 version
and now it blocks. It looks like a deadlock happens shortly after it boots.
--
Loïc Dachary, Artisan Logiciel Libre
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
