Re: pf/carp for redundant production use
On Sep 26, 2005, at 11:07 AM, Chad M Stewart wrote: On Sep 25, 2005, at 9:39 PM, Jason Dixon wrote: On Sep 25, 2005, at 8:30 AM, Neil wrote: Yep, the same behavior when the master dies. The solution that the person in #pf told me is use routing but I don't know how to implement. He told me that it's an issue in pf's NAT. 2) This is not tested, but I suspect that you should be able to use the new interface grouping features in 3.8 to simply assign multiple physical interfaces to the same group. Even if one fails, the other *should* maintain the MASTER state and avoid any partial failure consequences. I'd love to hear from other users or developers that have tried the grouping feature in this sort of scenario. Can you share where one might read more about the interface grouping features of 3.8? Sorry, I meant to refer to the new trunking features (man 4 trunk). -- Jason Dixon DixonGroup Consulting http://www.dixongroup.net
Re: pf/carp for redundant production use
On Sep 25, 2005, at 9:39 PM, Jason Dixon wrote: On Sep 25, 2005, at 8:30 AM, Neil wrote: Yep, the same behavior when the master dies. The solution that the person in #pf told me is use routing but I don't know how to implement. He told me that it's an issue in pf's NAT. Bullshit. Ok, here is the layman's description of the problem and the practical solution(s) to it. I'd love to be able to explain why interfaces recovering from INIT don't reclaim MASTER faster than they do (approx 30 seconds in my tests), but I don't understand the code-level logistics of everything. Hint: This is only a problem using single CARP hosts with preemption. PROBLEM: With a simple CARP design using a single CARP host on each segment and preemption enabled, failover occurs as expected in the case of any system offline condition (server crashes, admin reboots, etc). If a single interface goes from MASTER to INIT state (cable gets pulled, cable goes bad, card goes bad, etc), the 2nd interface on that system will go into BACKUP mode as expected. Traffic will route across the new MASTER, and will continue to do so while the failed system is in an INIT/BACKUP state. However, if the failed interface returns from INIT to an available mode (we plug the cable in), we notice that the 2nd interface reclaims MASTER almost immediately, but the restored interface does not. It becomes a BACKUP host, which leaves us with a routing impossibility: I agree a routing impossibility. Last week I built a lab to test/ build a new HA firewall. In my testing I did not see the 30 second delay people are reporting. Both carp interfaces on the primary would take over as MASTER within seconds of bringing the 'failed' physical interface back online. I started a large file download over http with everything running through the primary firewall. I then pulled a cable and watched the download of the file, it slowed slightly, but went right back to previous speed. (Like your scp demo at NYCBSDCON.) I actually disconnected and reconnected the cable a bunch of times and the download never stopped. I did notice one strange thing. I have 3 physical interfaces and two carp interfaces on each firewall. I noticed that if I was pinging the external/carp0 address and failed things over, say by doing 'ifconfig rl0 down' the ping would continue with zero packet loss. If I do that same thing on the internal/carp1, I see a small amount of packet loss. I don't really care about that since most clients/ people are not going to notice. I've already tested and know that downloads and other such things continue to work without a problem. I found it strange that carp0 would not have a packet loss while carp1 would. I did not investigate the packet loss further to know if maybe it was the hub/switch combo I'm using on the inside vs external. BACKUP MASTER carp0 carp0 | | host1 host2 | | carp1 carp1 MASTER BACKUP Any internal clients will attempt to send traffic through the "new gateway" (host1), although neither system has any way of routing the traffic properly (not without some hokey static routes bypassing the CARP hosts). NOTE: I have found that the original MASTER does indeed return to the correct state, approximately 30 seconds later. This is reproducible, but YMMV. SOLUTION: 1) If you really are concerned about a partial system failure (unplugged cable, bad card, etc), then scrap the single CARP host/ segment design and use arpbalance with multiple CARP hosts. The same partial-failure test using 2 CARP hosts on each segment with arpbalance resulted in a perfect failover and recovery with no packet loss. 2) This is not tested, but I suspect that you should be able to use the new interface grouping features in 3.8 to simply assign multiple physical interfaces to the same group. Even if one fails, the other *should* maintain the MASTER state and avoid any partial failure consequences. I'd love to hear from other users or developers that have tried the grouping feature in this sort of scenario. Can you share where one might read more about the interface grouping features of 3.8? I'm using a snapshot from September 10th in my lab. -Chad
Re: pf/carp for redundant production use
Hi Jason, I would like to try your #1 suggestion but unfortunately, I don't know where to start. What are the programs I need? What configuration? Is there any existing sample configuration on a link that I can follow? Thanks for explaining this in very detail. Neil Jason Dixon writes: On Sep 25, 2005, at 8:30 AM, Neil wrote: Yep, the same behavior when the master dies. The solution that the person in #pf told me is use routing but I don't know how to implement. He told me that it's an issue in pf's NAT. Bullshit. Ok, here is the layman's description of the problem and the practical solution(s) to it. I'd love to be able to explain why interfaces recovering from INIT don't reclaim MASTER faster than they do (approx 30 seconds in my tests), but I don't understand the code-level logistics of everything. Hint: This is only a problem using single CARP hosts with preemption. PROBLEM: With a simple CARP design using a single CARP host on each segment and preemption enabled, failover occurs as expected in the case of any system offline condition (server crashes, admin reboots, etc). If a single interface goes from MASTER to INIT state (cable gets pulled, cable goes bad, card goes bad, etc), the 2nd interface on that system will go into BACKUP mode as expected. Traffic will route across the new MASTER, and will continue to do so while the failed system is in an INIT/BACKUP state. However, if the failed interface returns from INIT to an available mode (we plug the cable in), we notice that the 2nd interface reclaims MASTER almost immediately, but the restored interface does not. It becomes a BACKUP host, which leaves us with a routing impossibility: BACKUP MASTER carp0 carp0 | | host1 host2 | | carp1 carp1 MASTER BACKUP Any internal clients will attempt to send traffic through the "new gateway" (host1), although neither system has any way of routing the traffic properly (not without some hokey static routes bypassing the CARP hosts). NOTE: I have found that the original MASTER does indeed return to the correct state, approximately 30 seconds later. This is reproducible, but YMMV. SOLUTION: 1) If you really are concerned about a partial system failure (unplugged cable, bad card, etc), then scrap the single CARP host/ segment design and use arpbalance with multiple CARP hosts. The same partial-failure test using 2 CARP hosts on each segment with arpbalance resulted in a perfect failover and recovery with no packet loss. 2) This is not tested, but I suspect that you should be able to use the new interface grouping features in 3.8 to simply assign multiple physical interfaces to the same group. Even if one fails, the other *should* maintain the MASTER state and avoid any partial failure consequences. I'd love to hear from other users or developers that have tried the grouping feature in this sort of scenario. -- Jason Dixon DixonGroup Consulting http://www.dixongroup.net
Re: pf/carp for redundant production use
On Sep 25, 2005, at 8:30 AM, Neil wrote: Yep, the same behavior when the master dies. The solution that the person in #pf told me is use routing but I don't know how to implement. He told me that it's an issue in pf's NAT. Bullshit. Ok, here is the layman's description of the problem and the practical solution(s) to it. I'd love to be able to explain why interfaces recovering from INIT don't reclaim MASTER faster than they do (approx 30 seconds in my tests), but I don't understand the code-level logistics of everything. Hint: This is only a problem using single CARP hosts with preemption. PROBLEM: With a simple CARP design using a single CARP host on each segment and preemption enabled, failover occurs as expected in the case of any system offline condition (server crashes, admin reboots, etc). If a single interface goes from MASTER to INIT state (cable gets pulled, cable goes bad, card goes bad, etc), the 2nd interface on that system will go into BACKUP mode as expected. Traffic will route across the new MASTER, and will continue to do so while the failed system is in an INIT/BACKUP state. However, if the failed interface returns from INIT to an available mode (we plug the cable in), we notice that the 2nd interface reclaims MASTER almost immediately, but the restored interface does not. It becomes a BACKUP host, which leaves us with a routing impossibility: BACKUP MASTER carp0 carp0 | | host1 host2 | | carp1 carp1 MASTER BACKUP Any internal clients will attempt to send traffic through the "new gateway" (host1), although neither system has any way of routing the traffic properly (not without some hokey static routes bypassing the CARP hosts). NOTE: I have found that the original MASTER does indeed return to the correct state, approximately 30 seconds later. This is reproducible, but YMMV. SOLUTION: 1) If you really are concerned about a partial system failure (unplugged cable, bad card, etc), then scrap the single CARP host/ segment design and use arpbalance with multiple CARP hosts. The same partial-failure test using 2 CARP hosts on each segment with arpbalance resulted in a perfect failover and recovery with no packet loss. 2) This is not tested, but I suspect that you should be able to use the new interface grouping features in 3.8 to simply assign multiple physical interfaces to the same group. Even if one fails, the other *should* maintain the MASTER state and avoid any partial failure consequences. I'd love to hear from other users or developers that have tried the grouping feature in this sort of scenario. -- Jason Dixon DixonGroup Consulting http://www.dixongroup.net
Re: pf/carp for redundant production use
On Sep 26, 2005, at 1:31 AM, Neil wrote: Hi Jason, I would like to try your #1 suggestion but unfortunately, I don't know where to start. What are the programs I need? What configuration? Is there any existing sample configuration on a link that I can follow? Thanks for explaining this in very detail. Please stop top-posting. Always start at the man pages; there is an example given (man 4 carp). There is a similar configuration in my NYC BSD Con slides (http://www.dixongroup.net/NYCBSDCON/); see the "Advanced Example". -- Jason Dixon DixonGroup Consulting http://www.dixongroup.net
Re: pf/carp for redundant production use
Neil wrote: Hi everyone, Just chat with someone in #pf and found out that pf at the moment cannot maintain state on TCP connections from internal machine to external machine when network cable on master firewall's external interface is removed. Anyways, most connections are coming from outside to inside and that is working well. :) This person is talking about state being kept on the backup firewall (which gets promoted to master when the master's cable is unplugged)? If so, that doesn't make any sense whatsoever. .joel
Re: pf/carp for redundant production use
On 07:30, Sun 25 Sep 05, Neil wrote: > Yep, the same behavior when the master dies. The solution that the person > in #pf told me is use routing but I don't know how to implement. He told me > that it's an issue in pf's NAT. Does this mean you cannot failover an office NAT firewall ? Pretty useless then if you ask me -- Michiel van Baak http://michiel.vanbaak.info [EMAIL PROTECTED] GnuPG key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x7E0B9A2D "Why is it drug addicts and computer afficionados are both called users?"
Re: pf/carp for redundant production use
On 00:21, Sun 25 Sep 05, Neil wrote: > Hi everyone, > > Just chat with someone in #pf and found out that pf at the moment cannot > maintain state on TCP connections from internal machine to external machine > when network cable on master firewall's external interface is removed. > > Anyways, most connections are coming from outside to inside and that is > working well. :) > Is the same true when the master dies ?? -- Michiel van Baak http://michiel.vanbaak.info [EMAIL PROTECTED] GnuPG key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x7E0B9A2D "Why is it drug addicts and computer afficionados are both called users?"
Re: pf/carp for redundant production use
Hi everyone, Just chat with someone in #pf and found out that pf at the moment cannot maintain state on TCP connections from internal machine to external machine when network cable on master firewall's external interface is removed. Anyways, most connections are coming from outside to inside and that is working well. :) Neil writes: Hi Joel, I just created a new email post. :) Thanks, neil j knight writes: Neil wrote: Yup that did the fix for the inbound. Now, I tried connecting to an ssh server from the internal machine to the external machine running openssh and i disconnected the cable, however, the ssh session was not able to recover. What should I change in my pf.conf configuration. Thanks for the first one. It's awesome! :D j knight writes: Hard to say. What does your troubleshooting tell you? What does pflog tell you? What does the state table look like on the new master? .joel
Re: pf/carp for redundant production use
Hi Joel, I just created a new email post. :) Thanks, neil j knight writes: Neil wrote: Yup that did the fix for the inbound. Now, I tried connecting to an ssh server from the internal machine to the external machine running openssh and i disconnected the cable, however, the ssh session was not able to recover. What should I change in my pf.conf configuration. Thanks for the first one. It's awesome! :D j knight writes: Hard to say. What does your troubleshooting tell you? What does pflog tell you? What does the state table look like on the new master? .joel
Re: pf/carp for redundant production use
Neil wrote: Yup that did the fix for the inbound. Now, I tried connecting to an ssh server from the internal machine to the external machine running openssh and i disconnected the cable, however, the ssh session was not able to recover. What should I change in my pf.conf configuration. Thanks for the first one. It's awesome! :D j knight writes: Hard to say. What does your troubleshooting tell you? What does pflog tell you? What does the state table look like on the new master? .joel
Re: pf/carp for redundant production use
Yup that did the fix for the inbound. Now, I tried connecting to an ssh server from the internal machine to the external machine running openssh and i disconnected the cable, however, the ssh session was not able to recover. What should I change in my pf.conf configuration. Thanks for the first one. It's awesome! :D j knight writes: Neil wrote: Ok guys. I will do it tonight once I reach home. I will also send my pf.conf file. Also, does it matter since I have different interfaces on FW1 and FW2? FW1, xl0, fxp0 and fxp1 FW2: rl0, fxp0 and ne3 You're using 'set state-policy if-bound' so yes, that does matter. Remove that set option. .joel
Re: pf/carp for redundant production use
Neil wrote: Ok guys. I will do it tonight once I reach home. I will also send my pf.conf file. Also, does it matter since I have different interfaces on FW1 and FW2? FW1, xl0, fxp0 and fxp1 FW2: rl0, fxp0 and ne3 You're using 'set state-policy if-bound' so yes, that does matter. Remove that set option. .joel
Re: pf/carp for redundant production use
Hi everyone, Firewall 1 troubleshooting info can be found at http://restricted.dyndns.org/pffw1.txt Firewall 2 @ http://restricted.dyndns.org/pffw2.txt The links include: 1. ifconfig output pre/post cable removal 2. pfctl -s state pre/post cable removal 3. pf.conf configs of both firewall Please let me know what you find. Thanks in advance, Neil Matt Rowley writes: I got pf and carp working together. However, I have noticed that TCP oriented application doesn't get recover well when I disconnect a cable. I setup a netcat listener on a machine inside the network. Then I ran netcat from another machine outside the network. I was able to connect and was able to send some characters. However, when I disconnected the primary firewall's external interface, netcat won't work anymore until I execute netcat again that connects to the shared external ip address. Am I missing any configuration? Looks like it's related to pf state tables not being sent to the backup firewall. Show your entire pf.conf. Let's see some troubleshooting commands. Run ifconfig before and after pulling the cable, etc. pfctl -s state on the carp slave would also be helpful, to see if pfsync is getting through.
Re: pf/carp for redundant production use
Ok guys. I will do it tonight once I reach home. I will also send my pf.conf file. Also, does it matter since I have different interfaces on FW1 and FW2? FW1, xl0, fxp0 and fxp1 FW2: rl0, fxp0 and ne3 Thanks guys! ;) Neil Matt Rowley writes: I got pf and carp working together. However, I have noticed that TCP oriented application doesn't get recover well when I disconnect a cable. I setup a netcat listener on a machine inside the network. Then I ran netcat from another machine outside the network. I was able to connect and was able to send some characters. However, when I disconnected the primary firewall's external interface, netcat won't work anymore until I execute netcat again that connects to the shared external ip address. Am I missing any configuration? Looks like it's related to pf state tables not being sent to the backup firewall. Show your entire pf.conf. Let's see some troubleshooting commands. Run ifconfig before and after pulling the cable, etc. pfctl -s state on the carp slave would also be helpful, to see if pfsync is getting through.
Re: pf/carp for redundant production use
I got pf and carp working together. However, I have noticed that TCP oriented application doesn't get recover well when I disconnect a cable. I setup a netcat listener on a machine inside the network. Then I ran netcat from another machine outside the network. I was able to connect and was able to send some characters. However, when I disconnected the primary firewall's external interface, netcat won't work anymore until I execute netcat again that connects to the shared external ip address. Am I missing any configuration? Looks like it's related to pf state tables not being sent to the backup firewall. Show your entire pf.conf. Let's see some troubleshooting commands. Run ifconfig before and after pulling the cable, etc. pfctl -s state on the carp slave would also be helpful, to see if pfsync is getting through.
Re: pf/carp for redundant production use
Neil wrote: Hi guys, I got pf and carp working together. However, I have noticed that TCP oriented application doesn't get recover well when I disconnect a cable. I setup a netcat listener on a machine inside the network. Then I ran netcat from another machine outside the network. I was able to connect and was able to send some characters. However, when I disconnected the primary firewall's external interface, netcat won't work anymore until I execute netcat again that connects to the shared external ip address. Am I missing any configuration? Looks like it's related to pf state tables not being sent to the backup firewall. Show your entire pf.conf. Let's see some troubleshooting commands. Run ifconfig before and after pulling the cable, etc. .joel
Re: pf/carp for redundant production use
Hi guys, I got pf and carp working together. However, I have noticed that TCP oriented application doesn't get recover well when I disconnect a cable. I setup a netcat listener on a machine inside the network. Then I ran netcat from another machine outside the network. I was able to connect and was able to send some characters. However, when I disconnected the primary firewall's external interface, netcat won't work anymore until I execute netcat again that connects to the shared external ip address. Am I missing any configuration? Looks like it's related to pf state tables not being sent to the backup firewall. Please help. Thanks, Neil Neil writes: Hi guys, I'm very new to carp. I used openbsd and pf about 2 yrs so I have forgotten it too. Anyways, I just finished building 2 machines with 3 nics on each machine. I got CARP working as well but have some questions. Here is my configuration: /*** /* FW1: /*** external interface: fxp1 => 192.168.1.1/24 internal interface: xl0 => 172.16.0.1/16 pfsync interfacefxp0 => 10.10.10.1/24 carp0: inet 172.16.0.100 255.255.0.0 172.16.255.255 carpdev xl0 vhid 1 pass lanpasswd carp1: inet 192.168.1.100 255.255.255.0 192.168.1.255 carpdev fxp1 vhid 2 pass netpasswd pfsync0: up syncif fxp0 /*** /* FW2: /*** external interface: ne3 => 192.168.1.2/24 internal interface: rl0 => 172.16.0.2/16 pfsync interfacefxp0 => 10.10.10.2/24 carp0: inet 172.16.0.100 255.255.0.0 172.16.255.255 carpdev rl0 vhid 1 pass lanpasswd advskew 128 carp1: inet 192.168.1.100 255.255.255.0 192.168.1.255 carpdev ne3 vhid 2 pass netpasswd advskew 128 pfsync0: up syncif fxp0 LAN shared IP: 172.16.0.100 WAN/Internet shared IP: 192.168.1.100 DIAGRAM: EXTERNAL +| 192.168.1.x |+ || fxp1||ne3 +-+ +-+ | fw1 |-fxp0---10.10.10.x---fxp0-| fw2 | +-+ +-+ xl0||rl0 || ---+| 172.16.x.x |+--- INTERNAL 1. Let say we want to do some NAT using CARP/PF setup: web server public: 192.168.1.10 web server NAT:172.16.1.10(real ip) mailserver public: 192.168.1.11 mailserver NAT:172.16.1.11(real ip) a. How will I configure CARP? b. How will I configure the pf.conf on both firewalls? An example will really help me a lot. c. Do I also have to create an alias interface for the 2 machine's external interface? 2. Can someone please send me a pf.conf that can be used in production environment? 3. Am I correct that my internal mailserver's and webserver's gateway should point to 172.16.0.100? 4. What if the interface where our pfsync is configured goes bad or cable gets disconnected, what happens? 5. Other than this setup, are there anything that I can add to make it much more reliable? Thanks in advance! Neil