Could you print your all thread callback via "thread apply all bt"?
On Thu, Aug 6, 2015 at 7:52 PM, Gurjar, Unmesh <unmesh.gur...@hp.com> wrote: > Hi, > > On a Ceph Firefly cluster (version [1]), OSDs are configured to use separate > data and journal disks (using the ceph-disk utility). It is observed, that > few OSDs start-up fine (are 'up' and 'in' state); however, others are stuck > in the 'init creating/touching snapmapper object' phase. Below is a OSD > start-up log snippet: > > 2015-08-06 08:58:02.491537 7fd312df97c0 1 journal _open > /var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block size 4096 > bytes, directio = 1, aio = 1 > 2015-08-06 08:58:02.498447 7fd312df97c0 1 journal _open > /var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block size 4096 > bytes, directio = 1, aio = 1 > 2015-08-06 08:58:02.498720 7fd312df97c0 2 osd.0 0 boot > 2015-08-06 08:58:02.498865 7fd312df97c0 10 osd.0 0 read_superblock > sb(2645bbf6-16d0-4c42-8835-8ba9f5c95a1d osd.0 > a821146f-0742-4724-b4ca-39ea4ccc298d e0 [0,0] lci=[0,0]) > 2015-08-06 08:58:02.498937 7fd312df97c0 10 osd.0 0 init creating/touching > snapmapper object > > The log statement is inaccurate though, since it is actually doing init > operation for the 'infos' object (as can be observed from source [2]). > > Upon debugging further, the thread seems to be waiting to acquire the > 'ObjectStore::apply_transaction::my_lock' mutex. Below is the debug trace: > > (gdb) where > #0 0x00007fd3122b708f in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x00007fd313132bf4 in > ObjectStore::apply_transactions(ObjectStore::Sequencer*, > std::list<ObjectStore::Transaction*, > std::allocator<ObjectStore::Transaction*> >&, Context*) () > #2 0x00007fd313097d08 in > ObjectStore::apply_transaction(ObjectStore::Transaction&, Context*) () > #3 0x00007fd313076790 in OSD::init() () > #4 0x00007fd3130233a7 in main () > > In a few cases, upon restarting the stuck OSD (service), it successfully > completes the 'init' phase and reaches the 'up' and 'in' state! > > Any help is greatly appreciated. Please let me know if any more details are > required for root causing. > > [1] - 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) > [2] - https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L1211 > > Regards, > Unmesh G. > IRC: unmeshg > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Regards, Wheat -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html