Hi,

comments inline:

On 9-3-2012 15:20, Kapetanakis Giannis wrote:
> On 08/03/12 18:17, Peter Hessler wrote:
>> On 2012 Mar 07 (Wed) at 15:58:21 +0200 (+0200), Kapetanakis Giannis
>> wrote:
>> :Hi,
>> :
>> :I'm running a setup of Active/backup firewalls with carp/pfsync
>> :successfully for the last year.
>> :
>> :Today I've upgraded the primary firewall to the latest snapshot (12
>> Feb),
>> :and as soon as the firewall booted it became MASTER before pfsync
>> :bulk transfer completed.
>>
>>
>> Can you show this piece from the logs?  Do you have additional logs?
>>
>> How are the interfaces connected, do you have a dedicated link for the
>> pfsync traffic?
>>
>> Can you also share your ruleset?
> 
> Both firewalls are now upgraded to latest snapshot (12 Feb).
> I've managed to reproduce it by rebooting primary firewall a while ago.
> 
> Logs are at the end.
> 
> Firewalls use dedicated interface for pfsync ($sync_if).

Are they connected directly via a cable or is there a switch in between?

> For external and internal connectivity I use dedicated VLANs ($ext_if
> and $int_if)
> 
> I'll show you relevant ruleset (due to internal policy I cannot share
> full ruleset).
> 
> @0 match in all scrub (no-df max-mss 1440)
> @1 block drop quick inet6 all
> @2 pass quick on $sync_if all flags S/SA keep state (no-sync)
> @3 pass quick on $ext_if proto carp all keep state (no-sync)
> @4 pass quick on $int_if proto carp all keep state (no-sync)
> @5 pass quick on $other_vlan_if proto carp all keep state (no-sync)
> @6 pass quick on $other_vlan_if proto carp all keep state (no-sync)

I usually have "set skip" on the sync_if, if it's dedicated.

> Here is the reproducing and the logs from /var/log/messages from both
> firewalls
> 
> firewall-1# ifconfig -g carp carpdemote
> firewall-1# sync;sync;reboot
> 
> Mar  9 15:46:42 firewall-2 /bsd: carp0: state transition: BACKUP -> MASTER
> Mar  9 15:46:42 firewall-2 /bsd: arp_rtrequest: bad gateway value
> Mar  9 15:46:42 firewall-2 /bsd: carp1: state transition: BACKUP -> MASTER
> Mar  9 15:46:42 firewall-2 /bsd: arp_rtrequest: bad gateway value
> Mar  9 15:46:42 firewall-2 /bsd: carp2: state transition: BACKUP -> MASTER
> Mar  9 15:46:42 firewall-2 /bsd: arp_rtrequest: bad gateway value
> Mar  9 15:46:42 firewall-2 /bsd: carp3: state transition: BACKUP -> MASTER
> Mar  9 15:46:42 firewall-2 /bsd: arp_rtrequest: bad gateway value

Any idea what causes the arp_rtrequest errors?  Are all your IP
addresses and netmasks sane?

> Mar  9 15:47:00 firewall-2 /bsd: carp: pfsync0 demoted group carp by 1
> to 1 (pfsyncdev)
> Mar  9 15:47:00 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by 1
> to 1 (pfsyncdev)
> Mar  9 15:47:02 firewall-2 /bsd: carp: pfsync0 demoted group carp by 1
> to 2 (pfsync bulk start)
> Mar  9 15:47:02 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by 1
> to 2 (pfsync bulk start)
> Mar  9 15:47:02 firewall-2 /bsd: carp: pfsync0 demoted group carp by -1
> to 1 (pfsyncdev)
> Mar  9 15:47:02 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by
> -1 to 1 (pfsyncdev)
> Mar  9 15:47:09 firewall-2 /bsd: carp: pfsync0 demoted group carp by 1
> to 2 (pfsyncdev)
> Mar  9 15:47:09 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by 1
> to 2 (pfsyncdev)
> Mar  9 15:47:11 firewall-2 /bsd: carp: pfsync0 demoted group carp by -1
> to 1 (pfsyncdev)
> Mar  9 15:47:11 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by
> -1 to 1 (pfsyncdev)
> Mar  9 15:47:26 firewall-2 /bsd: carp: pfsync0 demoted group carp by 1
> to 2 (pfsyncdev)
> Mar  9 15:47:26 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by 1
> to 2 (pfsyncdev)
> Mar  9 15:47:29 firewall-2 /bsd: carp: pfsync0 demoted group carp by -1
> to 1 (pfsyncdev)
> Mar  9 15:47:29 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by
> -1 to 1 (pfsyncdev)
> Mar  9 15:48:39 firewall-2 /bsd: carp: pfsync0 demoted group carp by 1
> to 2 (pfsyncdev)
> Mar  9 15:48:39 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by 1
> to 2 (pfsyncdev)
> Mar  9 15:48:43 firewall-2 /bsd: carp: pfsync0 demoted group carp by -1
> to 1 (pfsyncdev)
> Mar  9 15:48:43 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by
> -1 to 1 (pfsyncdev)
> Mar  9 15:49:08 firewall-2 /bsd: carp: pfsync0 demoted group carp by 1
> to 2 (pfsyncdev)
> Mar  9 15:49:08 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by 1
> to 2 (pfsyncdev)
> Mar  9 15:49:11 firewall-2 /bsd: carp: pfsync0 demoted group carp by -1
> to 1 (pfsyncdev)
> Mar  9 15:49:11 firewall-2 /bsd: carp: pfsync0 demoted group pfsync by
> -1 to 1 (pfsyncdev)
> Mar  9 15:49:16 firewall-2 /bsd: carp1: state transition: MASTER -> BACKUP
> Mar  9 15:49:16 firewall-2 /bsd: carp0: state transition: MASTER -> BACKUP
> Mar  9 15:49:16 firewall-2 /bsd: carp3: state transition: MASTER -> BACKUP
> Mar  9 15:49:16 firewall-2 /bsd: carp2: state transition: MASTER -> BACKUP
> 
> Mar  9 15:49:10 firewall-1 /bsd: root on sd0a (45a94a78df33ffd9.a) swap
> on sd0b dump on sd0b
> Mar  9 15:49:10 firewall-1 /bsd: carp: carp0 demoted group carp by 1 to
> 129 (carpdev)
> Mar  9 15:49:10 firewall-1 /bsd: carp: carp1 demoted group carp by 1 to
> 130 (carpdev)
> Mar  9 15:49:10 firewall-1 /bsd: carp: carp2 demoted group carp by 1 to
> 131 (carpdev)
> Mar  9 15:49:10 firewall-1 /bsd: carp: carp3 demoted group carp by 1 to
> 132 (carpdev)
> Mar  9 15:49:10 firewall-1 /bsd: carp: pfsync0 demoted group carp by 1
> to 133 (pfsync bulk start)
> Mar  9 15:49:10 firewall-1 /bsd: carp: pfsync0 demoted group pfsync by 1
> to 1 (pfsync bulk start)
> Mar  9 15:49:10 firewall-1 /bsd: carp: pfsync0 demoted group carp by -1
> to 132 (pfsyncdev)
> Mar  9 15:49:10 firewall-1 /bsd: carp: pfsync0 demoted group pfsync by
> -1 to 0 (pfsyncdev)
> Mar  9 15:49:10 firewall-1 /bsd: carp: carp1 demoted group carp by -1 to
> 131 (carpdev)
> Mar  9 15:49:10 firewall-1 /bsd: carp: carp0 demoted group carp by -1 to
> 130 (carpdev)
> Mar  9 15:49:10 firewall-1 /bsd: carp: carp3 demoted group carp by -1 to
> 129 (carpdev)
> Mar  9 15:49:10 firewall-1 /bsd: carp: carp2 demoted group carp by -1 to
> 128 (carpdev)
> 
> (Firewall-1 is taking over prior to pfsync bulk transfer)

While heavily demoted, it still assumes the master role.  I guess it's
not seeing the carp announcements from firewall-2 at all.

Do you use spanning tree in the network?

> Mar  9 15:49:12 firewall-1 /bsd: carp1: state transition: BACKUP -> MASTER
> Mar  9 15:49:12 firewall-1 /bsd: arp_rtrequest: bad gateway value
> Mar  9 15:49:12 firewall-1 /bsd: carp0: state transition: BACKUP -> MASTER
> Mar  9 15:49:12 firewall-1 /bsd: arp_rtrequest: bad gateway value
> Mar  9 15:49:13 firewall-1 /bsd: carp3: state transition: BACKUP -> MASTER
> Mar  9 15:49:13 firewall-1 /bsd: arp_rtrequest: bad gateway value
> Mar  9 15:49:13 firewall-1 /bsd: carp2: state transition: BACKUP -> MASTER
> Mar  9 15:49:13 firewall-1 /bsd: arp_rtrequest: bad gateway value
> 
> Manually enforce BACKUP mode
> firewall-1# ifconfig -g carp carpdemote

Here it gets weird...  it's already at demote=128, so adding one more
shouldn't help.  I suspect it would have gone to backup anyway.

> Mar  9 15:49:31 firewall-1 /bsd: carp1: state transition: MASTER -> BACKUP
> Mar  9 15:49:31 firewall-1 /bsd: carp0: state transition: MASTER -> BACKUP
> Mar  9 15:49:31 firewall-1 /bsd: carp2: state transition: MASTER -> BACKUP
> Mar  9 15:49:31 firewall-1 /bsd: carp3: state transition: MASTER -> BACKUP

This is around 30 seconds after the first boot message...  sounds like
the switch again that blocks the traffic on the port for 30 seconds.

> (Firewall-1 is again taking over prior to pfsync bulk transfer)
> 
> Mar  9 15:49:32 firewall-1 /bsd: carp0: state transition: BACKUP -> MASTER
> Mar  9 15:49:32 firewall-1 /bsd: arp_rtrequest: bad gateway value
> Mar  9 15:49:32 firewall-1 /bsd: carp2: state transition: BACKUP -> MASTER
> Mar  9 15:49:32 firewall-1 /bsd: arp_rtrequest: bad gateway value
> Mar  9 15:49:32 firewall-1 /bsd: carp1: state transition: BACKUP -> MASTER
> Mar  9 15:49:32 firewall-1 /bsd: arp_rtrequest: bad gateway value
> Mar  9 15:49:32 firewall-1 /bsd: carp3: state transition: BACKUP -> MASTER
> Mar  9 15:49:32 firewall-1 /bsd: arp_rtrequest: bad gateway value
> Mar  9 15:51:04 firewall-1 /bsd: carp: pfsync0 demoted group carp by -1
> to 0 (pfsync bulk done)
> Mar  9 15:51:04 firewall-1 /bsd: carp: pfsync0 demoted group pfsync by
> -1 to 0 (pfsync bulk done)

How many states do you typically have?  The bulk pfsync is taking a
really long time here... 4 minutes.  Any errors on the pfsync interface?
 What speed is it?

> and from daemon from firewall1 where ifstated is running
> 
> Mar  9 15:49:10 firewall-1 savecore: no core dump
> Mar  9 15:49:11 firewall-1 ifstated[29503]: initial state: auto
> Mar  9 15:49:11 firewall-1 ifstated[29503]: changing state to auto
> Mar  9 15:49:11 firewall-1 ifstated[29503]: changing state to backup
> Mar  9 15:49:19 firewall-1 ifstated[29503]: started
> Mar  9 15:49:31 firewall-1 ifstated[29503]: changing state to promoted
> Mar  9 15:49:31 firewall-1 ifstated[29503]: changing state to primary

What does your ifstated.conf look like?

Reply via email to