Hi, comments inline:
On 9-3-2012 15:20, Kapetanakis Giannis wrote: > On 08/03/12 18:17, Peter Hessler wrote: >> On 2012 Mar 07 (Wed) at 15:58:21 +0200 (+0200), Kapetanakis Giannis >> wrote: >> :Hi, >> : >> :I'm running a setup of Active/backup firewalls with carp/pfsync >> :successfully for the last year. >> : >> :Today I've upgraded the primary firewall to the latest snapshot (12 >> Feb), >> :and as soon as the firewall booted it became MASTER before pfsync >> :bulk transfer completed. >> >> >> Can you show this piece from the logs? Do you have additional logs? >> >> How are the interfaces connected, do you have a dedicated link for the >> pfsync traffic? >> >> Can you also share your ruleset? > > Both firewalls are now upgraded to latest snapshot (12 Feb). > I've managed to reproduce it by rebooting primary firewall a while ago. > > Logs are at the end. > > Firewalls use dedicated interface for pfsync ($sync_if). Are they connected directly via a cable or is there a switch in between? > For external and internal connectivity I use dedicated VLANs ($ext_if > and $int_if) > > I'll show you relevant ruleset (due to internal policy I cannot share > full ruleset). > > @0 match in all scrub (no-df max-mss 1440) > @1 block drop quick inet6 all > @2 pass quick on $sync_if all flags S/SA keep state (no-sync) > @3 pass quick on $ext_if proto carp all keep state (no-sync) > @4 pass quick on $int_if proto carp all keep state (no-sync) > @5 pass quick on $other_vlan_if proto carp all keep state (no-sync) > @6 pass quick on $other_vlan_if proto carp all keep state (no-sync) I usually have "set skip" on the sync_if, if it's dedicated. > Here is the reproducing and the logs from /var/log/messages from both > firewalls > > firewall-1# ifconfig -g carp carpdemote > firewall-1# sync;sync;reboot > > Mar 9 15:46:42 firewall-2 /bsd: carp0: state transition: BACKUP -> MASTER > Mar 9 15:46:42 firewall-2 /bsd: arp_rtrequest: bad gateway value > Mar 9 15:46:42 firewall-2 /bsd: carp1: state transition: BACKUP -> MASTER > Mar 9 15:46:42 firewall-2 /bsd: arp_rtrequest: bad gateway value > Mar 9 15:46:42 firewall-2 /bsd: carp2: state transition: BACKUP -> MASTER > Mar 9 15:46:42 firewall-2 /bsd: arp_rtrequest: bad gateway value > Mar 9 15:46:42 firewall-2 /bsd: carp3: state transition: BACKUP -> MASTER > Mar 9 15:46:42 firewall-2 /bsd: arp_rtrequest: bad gateway value Any idea what causes the arp_rtrequest errors? Are all your IP addresses and netmasks sane? > Mar 9 15:47:00 firewall-2 /bsd: carp: pfsync0 demoted group carp by 1 > to 1 (pfsyncdev) > Mar 9 15:47:00 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by 1 > to 1 (pfsyncdev) > Mar 9 15:47:02 firewall-2 /bsd: carp: pfsync0 demoted group carp by 1 > to 2 (pfsync bulk start) > Mar 9 15:47:02 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by 1 > to 2 (pfsync bulk start) > Mar 9 15:47:02 firewall-2 /bsd: carp: pfsync0 demoted group carp by -1 > to 1 (pfsyncdev) > Mar 9 15:47:02 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by > -1 to 1 (pfsyncdev) > Mar 9 15:47:09 firewall-2 /bsd: carp: pfsync0 demoted group carp by 1 > to 2 (pfsyncdev) > Mar 9 15:47:09 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by 1 > to 2 (pfsyncdev) > Mar 9 15:47:11 firewall-2 /bsd: carp: pfsync0 demoted group carp by -1 > to 1 (pfsyncdev) > Mar 9 15:47:11 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by > -1 to 1 (pfsyncdev) > Mar 9 15:47:26 firewall-2 /bsd: carp: pfsync0 demoted group carp by 1 > to 2 (pfsyncdev) > Mar 9 15:47:26 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by 1 > to 2 (pfsyncdev) > Mar 9 15:47:29 firewall-2 /bsd: carp: pfsync0 demoted group carp by -1 > to 1 (pfsyncdev) > Mar 9 15:47:29 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by > -1 to 1 (pfsyncdev) > Mar 9 15:48:39 firewall-2 /bsd: carp: pfsync0 demoted group carp by 1 > to 2 (pfsyncdev) > Mar 9 15:48:39 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by 1 > to 2 (pfsyncdev) > Mar 9 15:48:43 firewall-2 /bsd: carp: pfsync0 demoted group carp by -1 > to 1 (pfsyncdev) > Mar 9 15:48:43 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by > -1 to 1 (pfsyncdev) > Mar 9 15:49:08 firewall-2 /bsd: carp: pfsync0 demoted group carp by 1 > to 2 (pfsyncdev) > Mar 9 15:49:08 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by 1 > to 2 (pfsyncdev) > Mar 9 15:49:11 firewall-2 /bsd: carp: pfsync0 demoted group carp by -1 > to 1 (pfsyncdev) > Mar 9 15:49:11 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by > -1 to 1 (pfsyncdev) > Mar 9 15:49:16 firewall-2 /bsd: carp1: state transition: MASTER -> BACKUP > Mar 9 15:49:16 firewall-2 /bsd: carp0: state transition: MASTER -> BACKUP > Mar 9 15:49:16 firewall-2 /bsd: carp3: state transition: MASTER -> BACKUP > Mar 9 15:49:16 firewall-2 /bsd: carp2: state transition: MASTER -> BACKUP > > Mar 9 15:49:10 firewall-1 /bsd: root on sd0a (45a94a78df33ffd9.a) swap > on sd0b dump on sd0b > Mar 9 15:49:10 firewall-1 /bsd: carp: carp0 demoted group carp by 1 to > 129 (carpdev) > Mar 9 15:49:10 firewall-1 /bsd: carp: carp1 demoted group carp by 1 to > 130 (carpdev) > Mar 9 15:49:10 firewall-1 /bsd: carp: carp2 demoted group carp by 1 to > 131 (carpdev) > Mar 9 15:49:10 firewall-1 /bsd: carp: carp3 demoted group carp by 1 to > 132 (carpdev) > Mar 9 15:49:10 firewall-1 /bsd: carp: pfsync0 demoted group carp by 1 > to 133 (pfsync bulk start) > Mar 9 15:49:10 firewall-1 /bsd: carp: pfsync0 demoted group pfsync by 1 > to 1 (pfsync bulk start) > Mar 9 15:49:10 firewall-1 /bsd: carp: pfsync0 demoted group carp by -1 > to 132 (pfsyncdev) > Mar 9 15:49:10 firewall-1 /bsd: carp: pfsync0 demoted group pfsync by > -1 to 0 (pfsyncdev) > Mar 9 15:49:10 firewall-1 /bsd: carp: carp1 demoted group carp by -1 to > 131 (carpdev) > Mar 9 15:49:10 firewall-1 /bsd: carp: carp0 demoted group carp by -1 to > 130 (carpdev) > Mar 9 15:49:10 firewall-1 /bsd: carp: carp3 demoted group carp by -1 to > 129 (carpdev) > Mar 9 15:49:10 firewall-1 /bsd: carp: carp2 demoted group carp by -1 to > 128 (carpdev) > > (Firewall-1 is taking over prior to pfsync bulk transfer) While heavily demoted, it still assumes the master role. I guess it's not seeing the carp announcements from firewall-2 at all. Do you use spanning tree in the network? > Mar 9 15:49:12 firewall-1 /bsd: carp1: state transition: BACKUP -> MASTER > Mar 9 15:49:12 firewall-1 /bsd: arp_rtrequest: bad gateway value > Mar 9 15:49:12 firewall-1 /bsd: carp0: state transition: BACKUP -> MASTER > Mar 9 15:49:12 firewall-1 /bsd: arp_rtrequest: bad gateway value > Mar 9 15:49:13 firewall-1 /bsd: carp3: state transition: BACKUP -> MASTER > Mar 9 15:49:13 firewall-1 /bsd: arp_rtrequest: bad gateway value > Mar 9 15:49:13 firewall-1 /bsd: carp2: state transition: BACKUP -> MASTER > Mar 9 15:49:13 firewall-1 /bsd: arp_rtrequest: bad gateway value > > Manually enforce BACKUP mode > firewall-1# ifconfig -g carp carpdemote Here it gets weird... it's already at demote=128, so adding one more shouldn't help. I suspect it would have gone to backup anyway. > Mar 9 15:49:31 firewall-1 /bsd: carp1: state transition: MASTER -> BACKUP > Mar 9 15:49:31 firewall-1 /bsd: carp0: state transition: MASTER -> BACKUP > Mar 9 15:49:31 firewall-1 /bsd: carp2: state transition: MASTER -> BACKUP > Mar 9 15:49:31 firewall-1 /bsd: carp3: state transition: MASTER -> BACKUP This is around 30 seconds after the first boot message... sounds like the switch again that blocks the traffic on the port for 30 seconds. > (Firewall-1 is again taking over prior to pfsync bulk transfer) > > Mar 9 15:49:32 firewall-1 /bsd: carp0: state transition: BACKUP -> MASTER > Mar 9 15:49:32 firewall-1 /bsd: arp_rtrequest: bad gateway value > Mar 9 15:49:32 firewall-1 /bsd: carp2: state transition: BACKUP -> MASTER > Mar 9 15:49:32 firewall-1 /bsd: arp_rtrequest: bad gateway value > Mar 9 15:49:32 firewall-1 /bsd: carp1: state transition: BACKUP -> MASTER > Mar 9 15:49:32 firewall-1 /bsd: arp_rtrequest: bad gateway value > Mar 9 15:49:32 firewall-1 /bsd: carp3: state transition: BACKUP -> MASTER > Mar 9 15:49:32 firewall-1 /bsd: arp_rtrequest: bad gateway value > Mar 9 15:51:04 firewall-1 /bsd: carp: pfsync0 demoted group carp by -1 > to 0 (pfsync bulk done) > Mar 9 15:51:04 firewall-1 /bsd: carp: pfsync0 demoted group pfsync by > -1 to 0 (pfsync bulk done) How many states do you typically have? The bulk pfsync is taking a really long time here... 4 minutes. Any errors on the pfsync interface? What speed is it? > and from daemon from firewall1 where ifstated is running > > Mar 9 15:49:10 firewall-1 savecore: no core dump > Mar 9 15:49:11 firewall-1 ifstated[29503]: initial state: auto > Mar 9 15:49:11 firewall-1 ifstated[29503]: changing state to auto > Mar 9 15:49:11 firewall-1 ifstated[29503]: changing state to backup > Mar 9 15:49:19 firewall-1 ifstated[29503]: started > Mar 9 15:49:31 firewall-1 ifstated[29503]: changing state to promoted > Mar 9 15:49:31 firewall-1 ifstated[29503]: changing state to primary What does your ifstated.conf look like?

