On Mon, Nov 01, 2010 at 03:57:55PM -0700, Sage Weil wrote:
> Is there something in dmesg before the osd22 seq number errors pop up?
Yup, you were quite right. There was a bad crc that probably caused
the seq's to get out of sync.
Nov 1 10:12:50 bdio20 kernel: [233439.052725] ceph: osd22 10.138.138.13:6804
bad crc
Nov 1 10:12:51 bdio20 kernel: [233440.672738] ceph: skipping osd22
192.168.168.13:6804 seq 1, expected 2
Nov 1 10:12:51 bdio20 kernel: [233440.672958] ceph: skipping osd22
192.168.168.13:6804 seq 2, expected 3
Nov 1 10:12:51 bdio20 kernel: [233440.675705] ceph: skipping osd22
192.168.168.13:6804 seq 3, expected 4
> Something originally caused the seq's to get out of sync. I suspect it
> was a transient network error that made the TCP session drop and
> reconnect, and it's not skipping already-received messages. There was a
> bug in the skip code (so they stayed out of sync and osd22 eventually
> timed out). I pushed a fix for that to the ceph-client.git master branch
> (df9f86fa).
BTW, it looks like something may be unhappy? I tried doing a clone of
ceph-client.git, and I'm getting a failure:
% git clone git://ceph.newdream.net/git/ceph-client.git ceph-client
Cloning into ceph-client...
fatal: I don't handle protocol '/usr/local/google/git'
I downloaded df9f86fa and will try it out. Thanks for pushing out the
patch so quickly!
- Ted
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html