Re: [corosync] [PATCH] Add a troubleshooting guide to corosync.conf.5

Dmitry Koterov Wed, 14 Jan 2015 08:22:19 -0800

> > Please also add a note that one should specify IP addresses in ringX_addr
> > directives, not a domain name. Else corosync does not work properly in
> UDPu
> > mode, and at the same time it does not say anything significant in its
> log
> > files. I've spent 4 hours recently trying to figure this out.
> >
>
> As I was replying you on PCMK list. ringX_addr resolving should work as
> expected (I'm using only this configuration and same applies for most of
> the cluster created by pcs). Even if ringX_addr resolving would be
> broken, it's for sure not something appropriate for "TROUBLESHOOTING",
> but it's really about bug fix.
>
> Can you please attach corosync logs, so you would make possible for us
> to find root cause of problem you are hitting? (ideally with debug
> enabled).
>
> Sure, here they are:
http://oss.clusterlabs.org/pipermail/pacemaker/2015-January/023320.html


The complete NON-WORKING corosync.conf is (note that instead of "a.b.c.d" I
have a plain IP address):

# THIS IS A NON-WORKING CONFIGURATION DUE TO non-IP addresses in ringX_addr!
totem {
    version: 2
    cluster_name: velvica
    secauth: on
    clear_node_high_bit: yes
    interface {
        ringnumber: 0
        bindnetaddr: a.b.c.d
        mcastport: 5405
        ttl: 1
    }
    transport: udpu
    heartbeat_failures_allowed: 3
}
logging {
    fileline: off
    to_logfile: no
    to_syslog: yes
    debug: off
    timestamp: off
    logger_subsys {
        subsys: QUORUM
        debug: off
    }
}
nodelist {
  node {
    ring0_addr: node1  # <-- seems not working, IP address is needed
  }
  node {
    ring0_addr: node2
  }
  node {
    ring0_addr: node3
  }
}
quorum {
    provider: corosync_votequorum
}


If I then replace node1, node2, node3 with their IP addresses, everything
becomes working. See /var/log/syslog output at
http://oss.clusterlabs.org/pipermail/pacemaker/2015-January/023320.html



> > On Monday, January 5, 2015, Jan Pokorný <[email protected]> wrote:
> >
> >> (if you let me, some more in-line)
> >>
> >> On 05/01/15 16:20 +0000, Christine Caulfield wrote:
> >>> Looks good to me, thanks. I've fixed a few typos and pointed out a
> >> spurious
> >>> capital inline below
> >>>
> >>> On 05/01/15 14:39, Steven Dake wrote:
> >>>> Add a troubleshooting guide.  I'm sure other folks have some good
> stuff
> >>>> to put in here.  These are just the ones I know about :)
> >>>>
> >>>> Signed-off-by: Steven Dake <[email protected] <javascript:;>>
> >>>> ---
> >>>>  man/corosync.conf.5 | 39 +++++++++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 39 insertions(+)
> >>>>
> >>>> diff --git a/man/corosync.conf.5 b/man/corosync.conf.5
> >>>> index 8e774c1..16d84ca 100644
> >>>> --- a/man/corosync.conf.5
> >>>> +++ b/man/corosync.conf.5
> >>>> @@ -678,6 +678,45 @@ Native means one of shm or socket, depending on
> >> what is supported by OS. On syst
> >>>>  with support for both, SHM is selected. SHM is generally faster, but
> >> need to allocate
> >>>>  ring buffer file in /dev/shm.
> >>>>
> >>>> +.SH "TROUBLESHOOTING"
> >>>> +.TP
> >>>> +Ocassionally Corosync will not work with the default network.  Here
> >> are some
> >>     ^^^ Occasionally
> >>
> >>>> +common tips that people have used to find a working Corosync.
> >>>> +
> >>>> +.TP
> >>>> +Disable the firewall.  The firwall could block Corosync packets from
> >> reaching
> >>>                             ^^firewall
> >>>> +the network.
> >>>> +
> >>>> +.TP
> >>>> +Force IGMP v2.  Some modern switches do not support the kernel IGMP
> v3
> >>>> + protocol.  As a result, They will not properly register the cluster.
> >> To do
> >>                              ^^^ they
> >>
> >>>> +this, simply run the command
> >>>> +
> >>>> +.BR sysctl -w net.ipv4.conf.all.force_igmp_version=2
> >>>> +
> >>>> +.TP
> >>>> +If on a routed network, set a larger ttl.  The TTL tells the routers
> >> how long
> >>>> +to let the packet multicast before dropping it permanently.  The
> >> Default ttl
> >>>                                                              ^^^
> default
> >>
> >> (inconsistent casing of ttl/TTL)
> >>
> >>>> +is set to 1, which means the packet will drop after its first hop.
> >> This will
> >>>> +not work well on a routed network.
> >>>> +
> >>>> +.TP
> >>>> +I use a VLAN and Corosync doesn't work.  If your using a VLAN, VLAN's
> >> shave the
> >>>                                            ^^^ you're             VLANs
> >>>
> >>>> +packet size available for Corosync to use in some cases. Corosync
> does
> >> not
> >>>> +automatically adjust to this change.  Set netmtu appropriately when
> >> using a
> >>>> +VLAN.
> >>>> +
> >>>> +.TP
> >>>> +If all else fails, use UDPU.  The authors implemented UDPU to solve
> >> the various
> >>>> +problems with multicast that plague modern switch implementations.
> >> The UDPU
> >>>> +protocol was initially believed to be much slower but the reality
> after
> >>>> +implementation is that it doesn't make much difference.
> >>>> +
> >>>> +Even with UDPU you would be hard pressed to find a faster group
> >> messaging
> >>>> +system than Corosync.  The only downside of UDPU is it results in
> much
> >> more
> >>>> +packet copying across the network.
> >>>> +
> >>>> +
> >>>>  .SH "FILES"
> >>>>  .TP
> >>>>  /etc/corosync/corosync.conf
> >>
> >> --
> >> Jan
> >>
> >
> >
> >
> > _______________________________________________
> > discuss mailing list
> > [email protected]
> > http://lists.corosync.org/mailman/listinfo/discuss
> >
>
>

_______________________________________________
discuss mailing list
[email protected]
http://lists.corosync.org/mailman/listinfo/discuss

Re: [corosync] [PATCH] Add a troubleshooting guide to corosync.conf.5

Reply via email to