On Thu, Feb 21, 2013 at 2:08 PM, sangdrax8 <[email protected]> wrote:

> I am new to OpenBSD, but would like to take advantage of a redundant
> setup with ipsec/carp/sasync.  I have run into a situation which seems
> to be a bug, but thought it best if I first bring my questions here to
> see if there is something I am missing.
>
> I have tried the following with 5.1-stable, 5.2-stable, and my
> 5.2-stable setup with a snapshot kernel from 2/17/2013.  My main problem
> exists across all three setups.  My guess is that it seems the phase 1
> of an ipsec negotiation is not being synced with sasync, but I will
> describe my setup and results below and see if anyone else can assist me
> with this.
>
>
> My setup:
> fw1 and fw2 - carp/ipsec/sasync
> lab1 - ipsec
>
> Part that works as I expected it to:
>
> My fw1 and fw2 boxes are successfully running carp, and my fw1 is the
> master.  Using a machine behind the firewalls, I can initiate the ipsec
> tunnel by sending some icmp packets to a machine behind the lab1 box.
> While tcpdumping on the fw1 and fw2 interfaces, I can see the phase1 and
> phase2 of ipsec happen on fw1, and esp traffic passing.  I then verify
> sasync by running 'ipsecctl -s a' on both fw1 and fw2.  They both match,
> indicating that the SA created by the master did make it to the backup
> machine.
>
> I then wish to test failover between the two redundant firewalls, so I
> run 'ifconfig -g carp carpdemote 128' on the master machine.  I quickly
> see the backup take over, and the esp packets start showing up on my
> tcpdump on the backup machine.  I see the sequence numbers jump by
> 16384, which I have read is expected. (side note, this increase causes
> the tunnel to break in 5.2-stable, but was reported and seems fixed in
> my snapshot kernel tests, as well as working in 5.1-stable)  Initially
> this looks good, and even the spi's in use are the same.  So again
> sasync seems to be working, and I have a successful tunnel transition.
>
> Where things seem to go wrong:
>
> At this point if I keep watching the tcpdump on my fw2 (now the master
> passing traffic) I see that about one or two minutes after it takes
> over, it initiates a phase 1 re-key of the ipsec tunnel (and therefore a
> new phase 2 under this new phase1).  This happens quickly, and I can see
> the spi's change as the new association is now the one being used.  This
> re-key also resets the previously mentioned sequence numbers, making it
> easy to see when it took place.  I think things have gone wrong here,
> but traffic passes and will continue to re-key new phase 2 just fine.
> So it isn't obvious that anything is wrong.
>
> Evidence something is wrong:
>
> I now allow fw1 to take back over master with 'ifconfig -g carp
> -carpdemote 128' which also works.  I see the traffic now on my fw1
> tcpdump window, and the spi's are the ones that were re-negotiated by
> the backup when it did the strane phase1 and phase 2 rekey.  Once again
> my sequence numbers jump by 16384, as expected.  Now watching the
> tcpdump on fw1, I see that about one or two minutes in it attempts a
> re-key, but not exactly like the backup one did when it took over.  It
> only initiates a phase 2 re-key with the remote host.  This re-key is
> attempted a few times, but always seems rejected by the lab1 side.
> After waiting the default of nearly 20 minutes for phase 2 to expire,
> the fw1 begins trying to get a phase 2 re-key again only to be denied
> again by the lab machine.  Eventually the phase 2 expires, and all
> traffic dies across the VPN.  It will stay dead, trying to re-key phase
> 2 and being rejected by the lab1 machine.
>
> My best guess as to what is going on:
>
> So from the above sequence I am guessing that the sasync isn't actually
> syncing a phase 1 between the fw1 and fw2.  Once the fw2 takes over, it
> decides to re-key the phase 2 (perhaps due to high sequence numbers?)
> but finds it has no valid phase 1 with which to talk to the lab machine.
> It therefore initiates a new phase 1 negotiation with the lab machine,
> which succeeds.  It follows this up with a phase 2, and traffic
> continues to pass between these two boxes.  Now in this current state it
> would (I am guessing here) imply that the fw1 has a non-expired phase 1
> association with the lab box, which the lab box has replaced with a
> newly negotiated phase 1 from fw2.  If fw2 tries to re-key phase 2,
> everything works since fw2 and the lab box now agree on the phase 1
> between them.  When I then allow fw1 to take back over as master, it
> attempts to re-key phase 2(again maybe due to sequence numbers?) but is
> apparently rejected by lab1.  Since this phase 2 synced, traffic
> continues but eventually the writing is on the wall.  Once this phase 2
> that was synced from fw2 expires, all traffic dies.  Fw1 will not be
> able to get a new phase 2 until the phase 1 expires and it re-keys phase
> 1 with the lab box.  The nail in the coffin for me was that once nothing
> will pass, If i demote fw1 again and let fw2 take over, the phase 2
> re-builds and traffic will begin passing.  This again makes me think
> that the only valid phase 1 is between fw2 and the lab1 box.  Finally, I
> rebooted fw1.  This cleared all SA's (aka the phase 1 that I believe it
> still had).  When it came back up it took over as carp master, traffic
> dropped for a short time while it re-built a phase 1 and phase 2, and
> then traffic began passing again.
>
> This is easier to test on the snapshot kernel, because 5.1 doesn't seem
> to support adjusting the timelimit for the phase 1 and 2 SA's.  I did
> see this behavior in 5.1 it just took longer to test.
>
> I realize this is a long post, but I wanted to get some opinions before
> just filing a bug report.  This is my first real attempt at getting
> synchronized OpenBSD encryption devices running, and I would prefer is
> someone else could verify what I am seeing.
>
>
> "Faithless is he, who says 'farewell', when the path darkens."
> "you just keep on trying till you run out of cake"
>
>
are you replicating through different version ?

-- 
---------------------------------------------------------------------------------------------------------------------
() ascii ribbon campaign - against html e-mail
/\

Reply via email to