On Fri, 12 Nov 2010, Christian Brunner wrote:
> Presumably I'm doing something wrong here, but I don't have clue what to...
>
> After restarting one of our osd-servers I get the following messages
> in the cosd-log:
>
> 2010-11-12 10:24:31.965058 7f5bac380710 -- 10.255.0.60:6802/17175 >>
> 10.255.0.60:6800/15859 pipe(0x7f5b98089300 sd=26 pgs=0 cs=0
> l=0).connect claims to be 0.0.0.0:6800/17108 not
> 10.255.0.60:6800/15859 - wrong node!
> 2010-11-12 10:24:32.489423 7f5b955ea710 -- 10.255.0.60:6803/17175 >>
> 10.255.0.60:6801/17108 pipe(0x7f5b98000d40 sd=30 pgs=0 cs=0
> l=0).connect claims to be 0.0.0.0:6801/17108 not
> 10.255.0.60:6801/17108 - presumably this is the same node!
Hmm. Some of these messages come up normally, but this sequence doesn't
look quite right. What usually happens is:
B restarts.
A's connection to B drops.
A reconnects to B's old address, reaches the new B, and gets 'wrong node!'
A gets a new osdmap with B's new address
A connects to new B.
What doesn't make sense to me here is that we then get 0.0.0.0:6801/17108,
because B doesn't yet know it's address. But in fact B must, because it's
address was published in the map.
Is this reproducible? Can you reproduce with
debug ms = 20
debug osd = 20
on the OSD, and
debug mon = 20
debug ms = 1
on the monitor, and send the logs from the mon and both OSDs?
Thanks!
sage
>
> The wrong node message is repeated a vew more times.
>
> After this every write to the osd seems to block. What is the right
> way to handle this?
>
> Thanks,
> Christian
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html