Your message dated Mon, 15 Jun 2020 08:59:03 +0200
with message-id <[email protected]>
and subject line Re: Bug#962454: Link failures after upgrade to +deb10u1
has caused the Debian Bug report #962454,
regarding Link failures after upgrade to +deb10u1
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
962454: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=962454
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Source: corosync
Version: 3.0.1-2+deb10u1
Severity: important

Hi,

Some weeks ago I upgraded corosync (3.0.1-2 -> 3.0.1-2+deb10u1) and
started to notice these messages in my nodes (two node cluster):
Jun  2 01:10:13 patty corosync[2346]:   [KNET  ] link: host: 2 link: 0 is down
Jun  2 01:10:13 patty corosync[2346]:   [KNET  ] host: host: 2 (passive) best 
link: 1 (pri: 1)
Jun  2 01:10:14 patty corosync[2346]:   [KNET  ] rx: host: 2 link: 0 is up
Jun  2 01:10:14 patty corosync[2346]:   [KNET  ] host: host: 2 (passive) best 
link: 0 (pri: 1)
Jun  3 03:11:07 patty corosync[2346]:   [KNET  ] link: host: 2 link: 1 is down
Jun  3 03:11:07 patty corosync[2346]:   [KNET  ] host: host: 2 (passive) best 
link: 0 (pri: 1)
Jun  3 03:11:08 patty corosync[2346]:   [KNET  ] rx: host: 2 link: 1 is up
Jun  3 03:11:08 patty corosync[2346]:   [KNET  ] host: host: 2 (passive) best 
link: 0 (pri: 1)

Notice the failure happens on with both links.  One of the links is a
cross-over cable. The other uses a bond with two interfaces.

These errors are more common on one of the nodes that on the other.

Some times they match (both nodes log the link failure), but most of the
time only one node complains:

Jun  4 01:16:23 selma corosync[52890]:   [KNET  ] link: host: 1 link: 0 is down
Jun  4 01:16:23 selma corosync[52890]:   [KNET  ] host: host: 1 (passive) best 
link: 1 (pri: 1)
Jun  4 01:16:24 selma corosync[52890]:   [KNET  ] rx: host: 1 link: 0 is up
Jun  4 01:16:24 selma corosync[52890]:   [KNET  ] host: host: 1 (passive) best 
link: 0 (pri: 1)
Jun  4 01:16:55 patty corosync[2346]:   [KNET  ] link: host: 2 link: 0 is down
Jun  4 01:16:55 patty corosync[2346]:   [KNET  ] host: host: 2 (passive) best 
link: 1 (pri: 1)
Jun  4 01:16:56 patty corosync[2346]:   [KNET  ] rx: host: 2 link: 0 is up
Jun  4 01:16:56 patty corosync[2346]:   [KNET  ] host: host: 2 (passive) best 
link: 0 (pri: 1)

Here's my config:
totem {
        version: 2
        cluster_name: web
        crypto_cipher: none
        crypto_hash: none
        interface {
                linknumber: 0
        }
        interface {
                linknumber: 1
        }
}
logging {
        fileline: off
        to_stderr: yes
        to_logfile: yes
        logfile: /var/log/corosync/corosync.log
        to_syslog: yes
        debug: off
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}
quorum {
        provider: corosync_votequorum
        expected_votes: 2
        two_node: 1
}
nodelist {
        node {
                name: patty
                nodeid: 1
                ring0_addr: 192.168.144.1
                ring1_addr: 10.10.1.5
        }
        node {
                name: selma
                nodeid: 2
                ring0_addr: 192.168.144.2
                ring1_addr: 10.10.1.6
        }
}


Any help is appreciated. Thanks,

Alberto


-- System Information:
Debian Release: bullseye/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 
'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.6.0-1-amd64 (SMP w/4 CPU cores)
Kernel taint flags: TAINT_FIRMWARE_WORKAROUND
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE= 
(charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

--- End Message ---
--- Begin Message ---
On Thu, Jun 11, 2020 at 10:59:06AM +0200, Valentin Vidic wrote:
> On Mon, Jun 08, 2020 at 12:29:35PM +0200, Alberto Gonzalez Iniesta wrote:
> > Some weeks ago I upgraded corosync (3.0.1-2 -> 3.0.1-2+deb10u1) and
> > started to notice these messages in my nodes (two node cluster):
> > Jun  2 01:10:13 patty corosync[2346]:   [KNET  ] link: host: 2 link: 0 is 
> > down
> > Jun  2 01:10:13 patty corosync[2346]:   [KNET  ] host: host: 2 (passive) 
> > best link: 1 (pri: 1)
> > Jun  2 01:10:14 patty corosync[2346]:   [KNET  ] rx: host: 2 link: 0 is up
> > Jun  2 01:10:14 patty corosync[2346]:   [KNET  ] host: host: 2 (passive) 
> > best link: 0 (pri: 1)
> > Jun  3 03:11:07 patty corosync[2346]:   [KNET  ] link: host: 2 link: 1 is 
> > down
> > Jun  3 03:11:07 patty corosync[2346]:   [KNET  ] host: host: 2 (passive) 
> > best link: 0 (pri: 1)
> > Jun  3 03:11:08 patty corosync[2346]:   [KNET  ] rx: host: 2 link: 1 is up
> > Jun  3 03:11:08 patty corosync[2346]:   [KNET  ] host: host: 2 (passive) 
> > best link: 0 (pri: 1)
> 
> Hi, can you confirm that downgrading to the previous version solves the
> link problem for you?
> 

Hi, so downgrading did not fix this, but worsened the situation:

Jun 15 07:10:34 selma corosync[21723]:   [TOTEM ] Token has not been received 
in 750 ms
Jun 15 07:10:35 selma corosync[21723]:   [TOTEM ] A processor failed, forming 
new configuration.
Jun 15 07:10:35 selma corosync[21723]:   [TOTEM ] A new membership (1:72) was 
formed. Members
Jun 15 07:10:35 selma corosync[21723]:   [CPG   ] downlist left_list: 0 received
Jun 15 07:10:35 selma corosync[21723]:   [CPG   ] downlist left_list: 0 received
Jun 15 07:10:35 selma corosync[21723]:   [QUORUM] Members[2]: 1 2
Jun 15 07:10:35 selma corosync[21723]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
---------
Jun 15 07:10:34 patty corosync[15095]:   [KNET  ] link: host: 2 link: 0 is down
Jun 15 07:10:34 patty corosync[15095]:   [KNET  ] host: host: 2 (passive) best 
link: 0 (pri: 1)
Jun 15 07:10:34 patty corosync[15095]:   [KNET  ] host: host: 2 has no active 
links
Jun 15 07:10:34 patty corosync[15095]:   [TOTEM ] Token has not been received 
in 36 ms
Jun 15 07:10:35 patty corosync[15095]:   [KNET  ] rx: host: 2 link: 0 is up
Jun 15 07:10:35 patty corosync[15095]:   [KNET  ] host: host: 2 (passive) best 
link: 0 (pri: 1)
Jun 15 07:10:35 patty corosync[15095]:   [TOTEM ] A new membership (1:72) was 
formed. Members
Jun 15 07:10:35 patty corosync[15095]:   [CPG   ] downlist left_list: 0 received
Jun 15 07:10:35 patty corosync[15095]:   [CPG   ] downlist left_list: 0 received
Jun 15 07:10:35 patty corosync[15095]:   [QUORUM] Members[2]: 1 2
Jun 15 07:10:35 patty corosync[15095]:   [MAIN  ] Completed service 
synchronization, ready to provide service.
-----------

So I guess I'll have to look somewhere else (but I'm quite sure this
started after the upgrade to +deb10u1)...

Anyway, thanks and sorry for the noise.


-- 
Alberto Gonzalez Iniesta    | Formación, consultoría y soporte técnico
mailto/sip: [email protected] | en GNU/Linux y software libre
Encrypted mail preferred    | http://inittab.com

Key fingerprint = 5347 CBD8 3E30 A9EB 4D7D  4BF2 009B 3375 6B9A AA55

--- End Message ---

Reply via email to