On Sun, Dec 30, 2012 at 10:56 PM, Samuel Just <sam.j...@inktank.com> wrote:
> Sorry for the delay.  A quick look at the log doesn't show anything
> obvious... Can you elaborate on how you caused the hang?
> -Sam
>

I am sorry for all this noise, the issue almost for sure has been
triggered by some bug in the Infiniband switch firmware because
per-port reset was able to solve ``wrong mark'' problem - at least, it
haven`t showed up yet for a week. The problem took almost two days
until resolution - all possible connectivity tests displayed no
overtimes or drops which can cause wrong marks. Finally, I have
started playing with TCP settings and found that ipv4.tcp_low_latency
raising possibility of ``wrong mark'' event several times when enabled
- so area of all possible causes quickly collapsed to the media-only
problem and I fixed problem soon.

> On Wed, Dec 19, 2012 at 3:53 AM, Andrey Korolyov <and...@xdel.ru> wrote:
>> Please take a look at the log below, this is slightly different bug -
>> both osd processes on the node was stuck eating all available cpu
>> until I killed them. This can be reproduced by doing parallel export
>> of different from same client IP using both ``rbd export'' or API
>> calls - after a couple of wrong ``downs'' osd.19 and osd.27 finally
>> stuck. What is more interesting, 10.5.0.33 holds most hungry set of
>> virtual machines, eating constantly four of twenty-four HT cores, and
>> this node fails almost always, Underlying fs is an XFS, ceph version
>> gf9d090e. With high possibility my previous reports are about side
>> effects of this problem.
>>
>> http://xdel.ru/downloads/ceph-log/osd-19_and_27_stuck.log.gz
>>
>> and timings for the monmap, logs are from different hosts, so they may
>> have a time shift of tens of milliseconds:
>>
>> http://xdel.ru/downloads/ceph-log/timings-crash-osd_19_and_27.txt
>>
>> Thanks!
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to