Please find ceph.conf at [1] and the corresponding OSD log at [2].

To clarify one thing I skipped earlier on, is while bringing up the OSDs, 
'ceph-disk activate' was getting hung (due to issue [3]). To get over this, I 
had to temporarily disable 'journal dio' to get the disk activated (with a 
'mark-init' set to none) and then explicitly start the OSD service after 
updating the conf to enable 'journal dio'. I am hopeful that this should not 
cause the present issue (since few OSD start successfully on first attempt and 
others on subsequent service restarts)!

[1] - http://paste.openstack.org/show/411161/
[2] - http://paste.openstack.org/show/411162/
[3] - http://tracker.ceph.com/issues/9768

Regards,
Unmesh G.
IRC: unmeshg

> -----Original Message-----
> From: Haomai Wang [mailto:haomaiw...@gmail.com]
> Sent: Thursday, August 06, 2015 6:22 PM
> To: Gurjar, Unmesh
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: OSD sometimes stuck in init phase
> 
> Don't find something strange.
> 
> Could you paste your ceph.conf? And restart this osd with debug_osd=20/20,
> debug_filestore=20/20 :-)
> 
> On Thu, Aug 6, 2015 at 8:09 PM, Gurjar, Unmesh <unmesh.gur...@hp.com>
> wrote:
> > Thanks for quick response Haomai! Please find the backtrace here [1].
> >
> > [1] - http://paste.openstack.org/show/411139/
> >
> > Regards,
> > Unmesh G.
> > IRC: unmeshg
> >
> >> -----Original Message-----
> >> From: Haomai Wang [mailto:haomaiw...@gmail.com]
> >> Sent: Thursday, August 06, 2015 5:31 PM
> >> To: Gurjar, Unmesh
> >> Cc: ceph-devel@vger.kernel.org
> >> Subject: Re: OSD sometimes stuck in init phase
> >>
> >> Could you print your all thread callback via "thread apply all bt"?
> >>
> >> On Thu, Aug 6, 2015 at 7:52 PM, Gurjar, Unmesh <unmesh.gur...@hp.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > On a Ceph Firefly cluster (version [1]), OSDs are configured to use
> >> > separate
> >> data and journal disks (using the ceph-disk utility). It is observed,
> >> that few OSDs start-up fine (are 'up' and 'in' state); however,
> >> others are stuck in the 'init creating/touching snapmapper object'
> >> phase. Below is a OSD start-up log
> >> snippet:
> >> >
> >> > 2015-08-06 08:58:02.491537 7fd312df97c0  1 journal _open
> >> > /var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block
> >> > size
> >> > 4096 bytes, directio = 1, aio = 1
> >> > 2015-08-06 08:58:02.498447 7fd312df97c0  1 journal _open
> >> > /var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block
> >> > size
> >> > 4096 bytes, directio = 1, aio = 1
> >> > 2015-08-06 08:58:02.498720 7fd312df97c0  2 osd.0 0 boot
> >> > 2015-08-06 08:58:02.498865 7fd312df97c0 10 osd.0 0 read_superblock
> >> > sb(2645bbf6-16d0-4c42-8835-8ba9f5c95a1d osd.0
> >> > a821146f-0742-4724-b4ca-39ea4ccc298d e0 [0,0] lci=[0,0])
> >> > 2015-08-06 08:58:02.498937 7fd312df97c0 10 osd.0 0 init
> >> > creating/touching snapmapper object
> >> >
> >> > The log statement is inaccurate though, since it is actually doing
> >> > init
> >> operation for the 'infos' object (as can be observed from source [2]).
> >> >
> >> > Upon debugging further, the thread seems to be waiting to acquire
> >> > the
> >> 'ObjectStore::apply_transaction::my_lock' mutex. Below is the debug trace:
> >> >
> >> > (gdb) where
> >> > #0  0x00007fd3122b708f in pthread_cond_wait@@GLIBC_2.3.2 () from
> >> > /lib/x86_64-linux-gnu/libpthread.so.0
> >> > #1  0x00007fd313132bf4 in
> >> > ObjectStore::apply_transactions(ObjectStore::Sequencer*,
> >> > std::list<ObjectStore::Transaction*,
> >> > std::allocator<ObjectStore::Transaction*> >&, Context*) ()
> >> > #2  0x00007fd313097d08 in
> >> > ObjectStore::apply_transaction(ObjectStore::Transaction&, Context*)
> >> > ()
> >> > #3  0x00007fd313076790 in OSD::init() ()
> >> > #4  0x00007fd3130233a7 in main ()
> >> >
> >> > In a few cases, upon restarting the stuck OSD (service), it
> >> > successfully
> >> completes the 'init' phase and reaches the 'up' and 'in' state!
> >> >
> >> > Any help is greatly appreciated. Please let me know if any more
> >> > details are
> >> required for root causing.
> >> >
> >> > [1] - 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
> >> > [2] -
> >> > https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L1211
> >> >
> >> > Regards,
> >> > Unmesh G.
> >> > IRC: unmeshg
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >> > in the body of a message to majord...@vger.kernel.org More
> >> > majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> >>
> >> --
> >> Best Regards,
> >>
> >> Wheat
> 
> 
> 
> --
> Best Regards,
> 
> Wheat

Reply via email to