Anything in your modem logs? DOCSIS layer 2 is a strange beast :) Any cabling issue such attenuators or splitters behind the modem?
Regards Patrick > On Aug 19, 2015, at 2:34 PM, Devin Reade <[email protected]> wrote: > > I'm trying to understand an odd behavior during carp failover > where one uplink goes numb until the demarc equipment is power > cycled. > > Consider the following: > > ISP1-demarc ISP2-demarc > | | > SW1 (Net1) SW2 (Net2) ----- C > |\ /| > | X | > |/ \| > FW-A - FW-B > |\ /| > | X | > |/ \| > SW3 (Net3) SW4 (Net4) > (no NAT) (NAT) > | > H4 > > ISP1-demarc and ISP2-demarc are the respective ISP's equipment (outside > of my control, other than power cycling them). SWn are all unmanaged > switches. > > FW-A, FW-B, and C are all OpenBSD boxes. FW-A and FW-B, in particular, > are running 5.7-STABLE in a master/slave carp configuration. Things > are set up so that traffic to/from Net3 is sent via ISP1 (no NAT) and > traffic to/from Net4 is sent via ISP2 (using NAT on on FW-A and FW-B). > H4 is a host sitting on Net4 in private address space. > > Static IPs are used throughout, including on both the SW1 and SW2 > subnets. FW-n are routers, not bridges. Pfsync is running via > a crossover cable between FW-A and FW-B. > > Behavior: > > In normal operations everything works as expected. During a carp > failover, everything for Net3 via ISP1 also works as expected. > However, during a failover I lose connectivity on Net4, in a qualified > manner (see below) until ISP2-demarc is power cycled. > > The obvious first answer is that ISP2-demarc (which is a Motorola > cable modem) probably has a limited number of MAC slots available > to it. However, that doesn't seem quite right. More details ... > > Before failover, I set up a 'ping -n' running on H4 and going to > a host elsewhere on the Internet (call it EXT). I also set up > a 'ping -n' on C going to the carp IP of FW-A and FW-B on Net2 > (lets call that Carp2). > > Now comes the wierd part. If I shut down the master, FW-A, I see > the following: > > 1. the running pings from C to Carp2 continue to work until ^C > 2. the running pings from H4 to EXT continue to work until ^C > 3. a concurrent newly created ping from C to Carp2 fails > 4. a concurrent newly created ping from H4 to EXT fails > 5. all other outbound traffic from Net4 fails (this is just > a generalization of (4). > > If I power cycle ISP2-demarc, sanity returns. That is, until > FW-A comes back up and FW-B is demoted again. Then I get the same > type of failures until ISP2-demarc is power cycled again. > > Power cycling switch SW2 instead of ISP2-demarc does not affect the > outcome. > > Ok, so how about the MACs? On Net2 we have the following MACs: > > - ISP2-demarc-mac (on ISP2-demarc) > - C-mac (on C) > - FW-A-mac (physical MAC on FW-A) > - FW-B-mac (physical MAC on FW-B) > - Carp2-mac (the virtual MAC used by Carp2, which I've verified > to be the same for both FW-A and FW-B when they are respectively > running as master. > > One wart here, and a difference between Net1 and Net2 is that on > Net1 both firewalls have their own IPs in addition to the Carp1 > IP. However, on Net2 both firewall's hostname.if file contains > only the 'up' keyword; no IP is used on that network until the > machine becomes the carp master. > > So that means that when H4 is pinging EXT, the pings are being > NAT'd to use the Carp1 IP. Therefore I wouldn't expect a failover > to cause the modem's MAC slots to overflow. > > But the *really* weird part is what is happening with C; why would > C not be able to ping Carp1 until ISP2-demarc is power-cycled, especially > with SW2 isolating the latter from Carp1 and C? > > And the story with C gets better. If I set up a tcpdump on FW-B's Net2 > interface, I see the following sequence of events: > > - before killing FW-A, I see arp requests and CARPv2 advertisements > from FW-A (based on the skew), and that's about it (as expected) > - upon shutting down FW-A, I see a CARPv2 packet from FW-B, and then > start seeing the ping request/reply pairs coming in from C (as expected) > - upon killing and restarting C's ping to Carp2, I no longer see the > response on C, but I'm seeing both the request and response in FW-B's > tcpdump. On C, I see only the echo response. (NOT expected) > > Does this last bit point the finger at SW2 being the culprit (perhaps > not routing packets to the appropriate NIC port), even though power > cycling SW2 isn't sufficient to fix the problem? > > Any other thoughts? > > Devin

