On Thu, 27 Aug 2009, Sage Weil wrote:
> It looks like an unused part of the ip address struct isn't getting 
> zeroed.  Can you try this?

Nevermind, that won't help.  :)

I'm trying to reproduce this now.  Knowing which version you're using will 
help.  Thanks!

sage



> 
> diff --git a/src/msg/msg_types.h b/src/msg/msg_types.h
> index 1830ce9..456085c 100644
> --- a/src/msg/msg_types.h
> +++ b/src/msg/msg_types.h
> @@ -130,6 +130,7 @@ struct entity_addr_t {
>      ipaddr.sin_family = AF_INET;
>    }
>    entity_addr_t(const ceph_entity_addr &v) {
> +    memset(&ipaddr, 0, sizeof(ipaddr));
>      erank = v.erank;
>      nonce = v.nonce;
>      ipaddr = v.ipaddr;
> 
> 
> 
> Also, what version are you using?  (`git-rev-parse HEAD`)
> 
> Thanks!
> sage
> 
> 
> 
> On Wed, 26 Aug 2009, Albert Ales wrote:
> 
> > Hi!
> > 
> > I got a cluster setup on two systems. I am having an issue where cmon is
> > crashing when I try to mount the ceph drive with 'mount -t ceph
> > 172.16.1.100:/ /mnt/ceph'
> > 
> > .100 and .101 are both setup as mon, osd and msd. I am trying to mount from
> > 172.16.1.101.
> > 
> > I attached the dump of running cmon with -D.
> > 
> > The debug on 172.16.1.100 is giving me a:
> > *12743.139859 1233515696 -- 172.16.1.100:6789/0/0 mon0 --> mon1
> > 172.16.1.101:6789 /0/0 -- paxos(pgmap lease lc 32 fc 22 pn 0 opn 0) -- ?+0
> > 0x103d9638
> > 12743.141417 1225127088 -- 172.16.1.100:6789/0/0 mon0 <== mon1
> > 172.16.1.101:6789 /0/0 367 ==== paxos(pgmap lease_ack lc 32 fc 22 pn 0 opn
> > 0) ==== 76+0+0 (1206066 930 0 0) 0x103d9638
> > 12743.724464 1329591472 -- 172.16.1.100:6789/0/0 >>
> > 172.16.1.101:28896/0/0pipe( 0x103d8f08 sd=10 pgs=0 cs=0).accept
> > incoming lossy connection, kicking
> > outgoing  lossless 0x103d8b20
> > 12743.729136 1321202864 -- 172.16.1.100:6789/0/0 >>
> > 172.16.1.101:28896/0/0pipe( 0x103d8b20 sd=10 pgs=16777216
> > cs=1).connect claims to be
> > 0.0.0.0:28896/0/0 not 1 72.16.1.101:28896/0/0 - presumably this is the same
> > node!
> > 12743.729345 1321202864 -- 172.16.1.100:6789/0/0 >>
> > 172.16.1.101:28896/0/0pipe( 0x103d8b20 sd=10 pgs=16777216
> > cs=1).connect got RESETSESSION
> > msg/SimpleMessenger.cc: In function 'int SimpleMessenger::Pipe::connect()':
> > msg/SimpleMessenger.cc:934: FAILED assert(connect_seq == reply.connect_seq)
> >  1: cmon(_Z18__ceph_assert_failPKcS0_iS0_+0x44) [0x102ffc88]
> >  2: cmon(_ZN15SimpleMessenger4Pipe7connectEv+0x13c0) [0x10164434]
> >  3: cmon(_ZN15SimpleMessenger4Pipe6writerEv+0xb4) [0x1016473c]
> >  4: cmon(_ZN15SimpleMessenger4Pipe6Writer5entryEv+0x28) [0x1017dcd4]
> >  5: cmon(_ZN6Thread11_entry_funcEPv+0x38) [0x10175fec]
> >  6: /lib/libpthread.so.0 [0xff95a6c]
> >  7: /lib/libc.so.6(clone+0x84) [0xfba5fac]
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> > int erpret this.
> > msg/SimpleMessenger.cc: In function 'int SimpleMessenger::Pipe::connect()':
> > msg/SimpleMessenger.cc:934: FAILED assert(connect_seq == reply.connect_seq)
> >  1: cmon(_Z18__ceph_assert_failPKcS0_iS0_+0x44) [0x102ffc88]
> >  2: cmon(_ZN15SimpleMessenger4Pipe7connectEv+0x13c0) [0x10164434]
> >  3: cmon(_ZN15SimpleMessenger4Pipe6writerEv+0xb4) [0x1016473c]
> >  4: cmon(_ZN15SimpleMessenger4Pipe6Writer5entryEv+0x28) [0x1017dcd4]
> >  5: cmon(_ZN6Thread11_entry_funcEPv+0x38) [0x10175fec]
> >  6: /lib/libpthread.so.0 [0xff95a6c]
> >  7: /lib/libc.so.6(clone+0x84) [0xfba5fac]
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> > int erpret this.
> > terminate called after throwing an instance of 'FailedAssertion*'
> > Aborted*
> > 
> > I also tried to mount is from a host that was not in the cluster:
> > 172.16.1.10 using cfuse and the client responded with a:
> > *bound to 0.0.0.0:6800/29871/0, mounting ceph
> > 09.08.26 13:47:44.002212 3050806160 -- 0.0.0.0:6800/29871/0 >>
> > 172.16.1.100:6789/0/0 pipe(0x88e5e90 sd=4 pgs=0 cs=0).connect claims to be
> > 172.16.1.100:6789/0/0 not 172.16.1.100:6789/0/0 - wrong node!
> > 09.08.26 13:47:44.002581 3067591568 client-1 ms_handle_failure client_mount
> > to mon0 172.16.1.100:6789/0/0
> > 09.08.26 13:47:44.002630 3067591568 client-1 ms_handle_reset on
> > 172.16.1.100:6789/0/0
> > *
> > I am not sure what I could be doing wrong...
> > 
> > Albert.
> > 
> 
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
> trial. Simplify your report design, integration and deployment - and focus on 
> what you do best, core application coding. Discover what's new with 
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Ceph-devel mailing list
> Ceph-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ceph-devel
> 
> 

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Ceph-devel mailing list
Ceph-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ceph-devel

Reply via email to