On Mon, 27 Feb 2012, Székelyi Szabolcs wrote:
> Hello,
> 
> whenever I restart osd.0 I see a pair of messages like
> 
> 2012-02-27 17:26:00.132666 mon.0 <osd_1_ip>:6789/0 106 : [INF] osd.0 
> <osd_0_ip>:6801/29931 failed (by osd.1 <osd_1_ip>:6806/20125)
> 2012-02-27 17:26:21.074926 osd.0 <osd_0_ip>:6801/29931 1 : [WRN] map e370 
> wrongly marked me down or wrong addr
> 
> a couple of times. The situation stabilizes in a normal state after about two 
> minutes.
> 
> Should I worry about this? Maybe the first message is about the just killed 
> OSD, and the second comes from the new incarnation, and this is completely 
> normal? This is Ceph 0.41.

It's not normal.  Wido was seeing something similar, I think.  I suspect 
the problem is that during startup ceph-osd just busy, but the heartbeat 
code is such that it's not supposed to miss them.  

Can you reproduce this with 'debug ms = 1'?

sage

Reply via email to