The hammer release is nearly end-of-life pending the release of
luminous. I wouldn't say it's a bug so much as a consequence of timing
out RADOS operations -- as I stated before, you most likely have
another thread stuck waiting on the cluster while that lock is held,
but you only provided the backtrace for a single thread.

On Tue, Aug 8, 2017 at 2:34 AM, Shilu <shi...@h3c.com> wrote:
> rbd_data.259fe1073f804.0000000000000929 925696~4096 should_complete: r = -110 
>    this  is  timeout log, I put few logfile.
>
> I stop ceph by ceph osd pause, then ceph osd unpause , i use librbd by tgt, 
> it will cause tgt thread hang, finally tgt can not write data to ceph
>
>
> I test this on ceph 10.2.5,It work well, I think librbd has a bug on ceph 
> 0.94.5
>
> My Ceph.conf set rados_mon_op_timeout =75
>            rados_osd_op_timeout = 75
>            client_mount_timeout = 75
>
> -----邮件原件-----
> 发件人: Jason Dillaman [mailto:jdill...@redhat.com]
> 发送时间: 2017年8月8日 7:58
> 收件人: shilu 09816 (RD)
> 抄送: ceph-users
> 主题: Re: hammer(0.94.5) librbd dead lock,i want to how to resolve
>
> I am not sure what you mean by "I stop ceph" (stopped all the OSDs?)
> -- and I am not sure how you are seeing ETIMEDOUT errors on a "rbd_write" 
> call since it should just block assuming you are referring to stopping the 
> OSDs. What is your use-case? Are you developing your own application on top 
> of librbd?
>
> Regardless, I can only assume there is another thread that is blocked while 
> it owns the librbd::ImageCtx::owner_lock.
>
> On Mon, Aug 7, 2017 at 8:35 AM, Shilu <shi...@h3c.com> wrote:
>> I write data by rbd_write,when I stop ceph, rbd_write timeout and
>> return
>> -110
>>
>>
>>
>> Then I call rbd_write again, it will deadlock, the code stack is
>> showed below
>>
>>
>>
>>
>>
>>
>>
>> #0  pthread_rwlock_rdlock () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:87
>>
>> #1  0x00007fafbf9f75a0 in RWLock::get_read (this=0x7fafc48e1198) at
>> ./common/RWLock.h:76
>>
>> #2  0x00007fafbfa31de0 in RLocker (lock=..., this=<synthetic pointer>)
>> at
>> ./common/RWLock.h:130
>>
>> #3  librbd::aio_write (ictx=0x7fafc48e1000, off=71516229632, len=4096,
>>
>>     buf=0x7fafc499e000 "\235?[\257\367n\255\263?\200\034\061\341\r",
>> c=0x7fafab44ef80, op_flags=0) at librbd/internal.cc:3320
>>
>> #4  0x00007fafbf9eff19 in Context::complete (this=0x7fafab4174c0,
>> r=<optimized out>) at ./include/Context.h:65
>>
>> #5  0x00007fafbfb00016 in ThreadPool::worker (this=0x7fafc4852c40,
>> wt=0x7fafc4948550) at common/WorkQueue.cc:128
>>
>> #6  0x00007fafbfb010b0 in ThreadPool::WorkThread::entry
>> (this=<optimized
>> out>) at common/WorkQueue.h:408
>>
>> #7  0x00007fafc59b6184 in start_thread (arg=0x7fafadbed700) at
>> pthread_create.c:312
>>
>> #8  0x00007fafc52aaffd in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>
>> ----------------------------------------------------------------------
>> ---------------------------------------------------------------
>> 本邮件及其附件含有新华三技术有限公司的保密信息,仅限于发送给上面地址中列出
>> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
>> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
>> 邮件!
>> This e-mail and its attachments contain confidential information from
>> New H3C, which is intended only for the person or entity whose address
>> is listed above. Any use of the information contained herein in any
>> way (including, but not limited to, total or partial disclosure,
>> reproduction, or dissemination) by persons other than the intended
>> recipient(s) is prohibited. If you receive this e-mail in error,
>> please notify the sender by phone or email immediately and delete it!
>
>
>
> --
> Jason



-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to