On Thu, Feb 21, 2013 at 2:08 PM, sangdrax8 <[email protected]> wrote:
> I am new to OpenBSD, but would like to take advantage of a redundant > setup with ipsec/carp/sasync. I have run into a situation which seems > to be a bug, but thought it best if I first bring my questions here to > see if there is something I am missing. > > I have tried the following with 5.1-stable, 5.2-stable, and my > 5.2-stable setup with a snapshot kernel from 2/17/2013. My main problem > exists across all three setups. My guess is that it seems the phase 1 > of an ipsec negotiation is not being synced with sasync, but I will > describe my setup and results below and see if anyone else can assist me > with this. > > > My setup: > fw1 and fw2 - carp/ipsec/sasync > lab1 - ipsec > > Part that works as I expected it to: > > My fw1 and fw2 boxes are successfully running carp, and my fw1 is the > master. Using a machine behind the firewalls, I can initiate the ipsec > tunnel by sending some icmp packets to a machine behind the lab1 box. > While tcpdumping on the fw1 and fw2 interfaces, I can see the phase1 and > phase2 of ipsec happen on fw1, and esp traffic passing. I then verify > sasync by running 'ipsecctl -s a' on both fw1 and fw2. They both match, > indicating that the SA created by the master did make it to the backup > machine. > > I then wish to test failover between the two redundant firewalls, so I > run 'ifconfig -g carp carpdemote 128' on the master machine. I quickly > see the backup take over, and the esp packets start showing up on my > tcpdump on the backup machine. I see the sequence numbers jump by > 16384, which I have read is expected. (side note, this increase causes > the tunnel to break in 5.2-stable, but was reported and seems fixed in > my snapshot kernel tests, as well as working in 5.1-stable) Initially > this looks good, and even the spi's in use are the same. So again > sasync seems to be working, and I have a successful tunnel transition. > > Where things seem to go wrong: > > At this point if I keep watching the tcpdump on my fw2 (now the master > passing traffic) I see that about one or two minutes after it takes > over, it initiates a phase 1 re-key of the ipsec tunnel (and therefore a > new phase 2 under this new phase1). This happens quickly, and I can see > the spi's change as the new association is now the one being used. This > re-key also resets the previously mentioned sequence numbers, making it > easy to see when it took place. I think things have gone wrong here, > but traffic passes and will continue to re-key new phase 2 just fine. > So it isn't obvious that anything is wrong. > > Evidence something is wrong: > > I now allow fw1 to take back over master with 'ifconfig -g carp > -carpdemote 128' which also works. I see the traffic now on my fw1 > tcpdump window, and the spi's are the ones that were re-negotiated by > the backup when it did the strane phase1 and phase 2 rekey. Once again > my sequence numbers jump by 16384, as expected. Now watching the > tcpdump on fw1, I see that about one or two minutes in it attempts a > re-key, but not exactly like the backup one did when it took over. It > only initiates a phase 2 re-key with the remote host. This re-key is > attempted a few times, but always seems rejected by the lab1 side. > After waiting the default of nearly 20 minutes for phase 2 to expire, > the fw1 begins trying to get a phase 2 re-key again only to be denied > again by the lab machine. Eventually the phase 2 expires, and all > traffic dies across the VPN. It will stay dead, trying to re-key phase > 2 and being rejected by the lab1 machine. > > My best guess as to what is going on: > > So from the above sequence I am guessing that the sasync isn't actually > syncing a phase 1 between the fw1 and fw2. Once the fw2 takes over, it > decides to re-key the phase 2 (perhaps due to high sequence numbers?) > but finds it has no valid phase 1 with which to talk to the lab machine. > It therefore initiates a new phase 1 negotiation with the lab machine, > which succeeds. It follows this up with a phase 2, and traffic > continues to pass between these two boxes. Now in this current state it > would (I am guessing here) imply that the fw1 has a non-expired phase 1 > association with the lab box, which the lab box has replaced with a > newly negotiated phase 1 from fw2. If fw2 tries to re-key phase 2, > everything works since fw2 and the lab box now agree on the phase 1 > between them. When I then allow fw1 to take back over as master, it > attempts to re-key phase 2(again maybe due to sequence numbers?) but is > apparently rejected by lab1. Since this phase 2 synced, traffic > continues but eventually the writing is on the wall. Once this phase 2 > that was synced from fw2 expires, all traffic dies. Fw1 will not be > able to get a new phase 2 until the phase 1 expires and it re-keys phase > 1 with the lab box. The nail in the coffin for me was that once nothing > will pass, If i demote fw1 again and let fw2 take over, the phase 2 > re-builds and traffic will begin passing. This again makes me think > that the only valid phase 1 is between fw2 and the lab1 box. Finally, I > rebooted fw1. This cleared all SA's (aka the phase 1 that I believe it > still had). When it came back up it took over as carp master, traffic > dropped for a short time while it re-built a phase 1 and phase 2, and > then traffic began passing again. > > This is easier to test on the snapshot kernel, because 5.1 doesn't seem > to support adjusting the timelimit for the phase 1 and 2 SA's. I did > see this behavior in 5.1 it just took longer to test. > > I realize this is a long post, but I wanted to get some opinions before > just filing a bug report. This is my first real attempt at getting > synchronized OpenBSD encryption devices running, and I would prefer is > someone else could verify what I am seeing. > > > "Faithless is he, who says 'farewell', when the path darkens." > "you just keep on trying till you run out of cake" > > are you replicating through different version ? -- --------------------------------------------------------------------------------------------------------------------- () ascii ribbon campaign - against html e-mail /\

