On 2012. February 28. 08:16:34 Gregory Farnum wrote:
> 2012/2/28 Székelyi Szabolcs <szeke...@niif.hu>:
> > On 2012. February 27. 09:03:11 Sage Weil wrote:
> >> On Mon, 27 Feb 2012, Székelyi Szabolcs wrote:
> >> > whenever I restart osd.0 I see a pair of messages like
> >> > 
> >> > 2012-02-27 17:26:00.132666 mon.0 <osd_1_ip>:6789/0 106 : [INF]
> >> > osd.0
> >> > <osd_0_ip>:6801/29931 failed (by osd.1 <osd_1_ip>:6806/20125)
> >> > 2012-02-27 17:26:21.074926 osd.0 <osd_0_ip>:6801/29931 1 : [WRN]
> >> > map
> >> > e370
> >> > wrongly marked me down or wrong addr
> >> > 
> >> > a couple of times. The situation stabilizes in a normal state
> >> > after
> >> > about two minutes.
> >> > 
> >> > Should I worry about this? Maybe the first message is about the
> >> > just
> >> > killed OSD, and the second comes from the new incarnation, and
> >> > this is
> >> > completely normal? This is Ceph 0.41.
> >> 
> >> It's not normal.  Wido was seeing something similar, I think.  I
> >> suspect
> >> the problem is that during startup ceph-osd just busy, but the
> >> heartbeat
> >> code is such that it's not supposed to miss them.
> >> 
> >> Can you reproduce this with 'debug ms = 1'?
> > 
> > Yes, I managed to. Output of ceph -w attached (with IP addresses
> > mangled). My setup is 3 nodes, node 1 and 2 running OSD, MDS and MON,
> > node 3 running MON only. I also have the logs from all nodes in case
> > you need it.
> 
> Yes, please. Just the cluster state is not very helpful — we want to
> see why the OSDs are marking each other down, not when. :)

Okay, it was a firewall issue. The port range that was allowed to reach the 
OSDs didn't include a number of necessary ports. It started working after a 
while because I also had a rule like

-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT 

So osd.1 could not talk to osd.0 after a restart (because of the wrong port 
range), only after osd.0 started talking to osd.1 (because of the -m state 
rule).

Sorry for the noise.

-- 
cc


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to