Could you print your all thread callback via "thread apply all bt"?

On Thu, Aug 6, 2015 at 7:52 PM, Gurjar, Unmesh <unmesh.gur...@hp.com> wrote:
> Hi,
>
> On a Ceph Firefly cluster (version [1]), OSDs are configured to use separate 
> data and journal disks (using the ceph-disk utility). It is observed, that 
> few OSDs start-up fine (are 'up' and 'in' state); however, others are stuck 
> in the 'init creating/touching snapmapper object' phase. Below is a OSD 
> start-up log snippet:
>
> 2015-08-06 08:58:02.491537 7fd312df97c0  1 journal _open 
> /var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block size 4096 
> bytes, directio = 1, aio = 1
> 2015-08-06 08:58:02.498447 7fd312df97c0  1 journal _open 
> /var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block size 4096 
> bytes, directio = 1, aio = 1
> 2015-08-06 08:58:02.498720 7fd312df97c0  2 osd.0 0 boot
> 2015-08-06 08:58:02.498865 7fd312df97c0 10 osd.0 0 read_superblock 
> sb(2645bbf6-16d0-4c42-8835-8ba9f5c95a1d osd.0 
> a821146f-0742-4724-b4ca-39ea4ccc298d e0 [0,0] lci=[0,0])
> 2015-08-06 08:58:02.498937 7fd312df97c0 10 osd.0 0 init creating/touching 
> snapmapper object
>
> The log statement is inaccurate though, since it is actually doing init 
> operation for the 'infos' object (as can be observed from source [2]).
>
> Upon debugging further, the thread seems to be waiting to acquire the 
> 'ObjectStore::apply_transaction::my_lock' mutex. Below is the debug trace:
>
> (gdb) where
> #0  0x00007fd3122b708f in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00007fd313132bf4 in 
> ObjectStore::apply_transactions(ObjectStore::Sequencer*, 
> std::list<ObjectStore::Transaction*, 
> std::allocator<ObjectStore::Transaction*> >&, Context*) ()
> #2  0x00007fd313097d08 in 
> ObjectStore::apply_transaction(ObjectStore::Transaction&, Context*) ()
> #3  0x00007fd313076790 in OSD::init() ()
> #4  0x00007fd3130233a7 in main ()
>
> In a few cases, upon restarting the stuck OSD (service), it successfully 
> completes the 'init' phase and reaches the 'up' and 'in' state!
>
> Any help is greatly appreciated. Please let me know if any more details are 
> required for root causing.
>
> [1] - 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
> [2] -  https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L1211
>
> Regards,
> Unmesh G.
> IRC: unmeshg
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to