Hi,

On a Ceph Firefly cluster (version [1]), OSDs are configured to use separate 
data and journal disks (using the ceph-disk utility). It is observed, that few 
OSDs start-up fine (are 'up' and 'in' state); however, others are stuck in the 
'init creating/touching snapmapper object' phase. Below is a OSD start-up log 
snippet:

2015-08-06 08:58:02.491537 7fd312df97c0  1 journal _open 
/var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block size 4096 
bytes, directio = 1, aio = 1
2015-08-06 08:58:02.498447 7fd312df97c0  1 journal _open 
/var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block size 4096 
bytes, directio = 1, aio = 1
2015-08-06 08:58:02.498720 7fd312df97c0  2 osd.0 0 boot
2015-08-06 08:58:02.498865 7fd312df97c0 10 osd.0 0 read_superblock 
sb(2645bbf6-16d0-4c42-8835-8ba9f5c95a1d osd.0 
a821146f-0742-4724-b4ca-39ea4ccc298d e0 [0,0] lci=[0,0])
2015-08-06 08:58:02.498937 7fd312df97c0 10 osd.0 0 init creating/touching 
snapmapper object

The log statement is inaccurate though, since it is actually doing init 
operation for the 'infos' object (as can be observed from source [2]).

Upon debugging further, the thread seems to be waiting to acquire the 
'ObjectStore::apply_transaction::my_lock' mutex. Below is the debug trace:

(gdb) where
#0  0x00007fd3122b708f in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007fd313132bf4 in 
ObjectStore::apply_transactions(ObjectStore::Sequencer*, 
std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> 
>&, Context*) ()
#2  0x00007fd313097d08 in 
ObjectStore::apply_transaction(ObjectStore::Transaction&, Context*) ()
#3  0x00007fd313076790 in OSD::init() ()
#4  0x00007fd3130233a7 in main ()

In a few cases, upon restarting the stuck OSD (service), it successfully 
completes the 'init' phase and reaches the 'up' and 'in' state! 

Any help is greatly appreciated. Please let me know if any more details are 
required for root causing.

[1] - 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
[2] -  https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L1211

Regards,
Unmesh G.
IRC: unmeshg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to