Hi list!

We're playing around with two 4.6 boxes, running carp and relayd. We successfully got a basic DSR setup running, and it seems to be working fine! However, when failing over to the secondary box, it fails. All inbound packets goes nicely through the box, and return packets from the Linux server directly back to the router. Long running sessions doesn't seem to be a problem either. The current setup consists of a clean PF, with only anchor-rules for relayd, and the follwing relayd conf:

fle_vip="10.0.0.40"
fle2="10.0.0.42"
table <fle> { $fle2 }

redirect fle {
        listen on $fle_vip port 443 interface vlan412
        route to <fle> check tcp interface vlan413
}

As simple as it gets! Anyway, as I said, that part works fine. The state i see after a connection have been established is the following:

all tcp 10.0.0.40:443 <- 192.168.0.1:50786       ESTABLISHED:ESTABLISHED
   [0 + 1]  [1035366774 + 2]
age 00:00:09, expires in 00:09:59, 514:0 pkts, 30678:0 bytes, anchor 0, rule 0, sloppy

I see this state on both boxes, so pfsync is working properly.
When I demote the master, and the backup takes over, the TCP connection gets terminated immediately. Looking at the state, it goes into TIME_WAIT on both boxes:

all tcp 10.0.0.40:443 <- 192.168.0.1:50786       TIME_WAIT:TIME_WAIT
  [0 + 1]  [1035366774 + 2]
age 00:00:18, expires in 00:02:59, 1221:0 pkts, 67459:0 bytes, anchor 0, rule 0, sloppy

Looking at the packets, I see the following on the incoming interface on the master before i failover:

09:33:04.941171 192.68.0.1.50786 > 10.0.0.40.443: . ack 3071549 win 33124 <nop,nop,timestamp 501213487 164423091> 09:33:04.942591 192.68.0.1.50786 > 10.0.0.40.443: . ack 3072750 win 32523 <nop,nop,timestamp 501213487 164423091>

Those where the last packets seen before failover, and immediately after failover this is what I see on the slave:

09:33:05.601850 192.168.0.1.50786 > 10.0.0.40.443: . ack 4221828731 win 32448 <nop,nop,timestamp 501213494 164423759> 09:33:05.601865 10.0.0.40.13.443 > 192.168.0.1.50786: R 4221828731:4221828731(0) win 0 (DF)

and a bunch more of those, ACKs responded to with RSTs.

As there are no other rules in pf, there shouldnt be anything explicitly dropped at least. I'm suspecting something fishy with the states or something.. I've tried pfctl -x loud, but it doesn't say anything.

Does anyone have any clues about what the problem could be? Googling the subject doesn't give much hits on the subject, except for the undeadly article and the original commits, so I suspect there aren't that many users/experimenters of this yet.. :)

Thanks for any input!

Best regards
Johan

Reply via email to