Yes, what James said, thank you for sharing this info. I think I would have given up at "counting f**king packet sequence numbers."
On Mon, Apr 30, 2018 at 10:13 AM James Buchanan <[email protected]> wrote: > Painful as this was, hats off to you for writing this up and sharing. Much > appreciated! > > On Mon, Apr 30, 2018 at 3:36 PM, Ryan Huff <[email protected]> wrote: > >> So here is a *neat* little situation I ran into recently, and is worth >> sharing and reading; if this saves a life it was worth the crap I had to go >> through ….. >> >> >> >> == The Scenario == >> >> >> >> - Expressway C/E 8.10.3 cluster over wan (2 Control Peers, 2 Edge >> Peers) >> - Customer deployed and managed SD-WAN solution in front of the Edge >> cluster to the Internet (with two separate transport carriers). I think it >> was Palos, but we’ll call it a whitebox’ed solution for our purposes >> - Using MRA and B2B Expressway configs >> - UAT for MRA and B2B is accepted and works great >> >> >> >> == The Problem == >> >> >> >> The customer applies the zone/search rule config in Expressway for CMR >> and notices that randomly, during a presentation session in the CMR, the >> BFCP server (AKA, the WebEx meeting) will close the BFCP presentation to >> the endpoint coming from the customer’s Expressway; all other BFCP clients >> are still receiving the BFCP presentation. That’s right, it *appears* >> that WebEx *kicked* the BFCP participant coming from the customer’s >> Edge, but not because the BFCP server closed the session (all other >> participants remain)! Although it was happening randomly’ish in length of >> time into the presentation, it would always happen at some point to the >> endpoint, generally around the 2 minute’ish mark. >> >> >> >> == The diagnosis == >> >> >> >> Although random, a consistent’ish length would seem to suggest a timer / >> re-invite of some flavor, and that would be wrong, as ultimately uncovered. >> Sparing you all the gory tales of escalation and vendor bus underskirt >> sliding; the issue was in fact, the SD-WAN solution itself. >> >> >> >> == The Explanation & The Fix == >> >> >> >> What was happening is that every 120 seconds or so, the BFCP server >> (WebEx meeting) would send a UDP BFCP packet to all the BFCP presentation >> subscribers. The customer’s SD-WAN solution was *identifying* these >> packets according to the customer (gotta love layer 7 capable firewalls >> 😊) and queueing them onto a physically different link than which the >> stream was on, thus creating *physical asymmetry, delay and latency*. I >> specifically requested that all inspection capabilities be turned off for >> the traffic but I guess that isn’t the same as “identifying the traffic” …. >> Lol. In a TCP stream, this would likely be tolerated to a degree as packet >> loss or delay and/or jitter and would simply re transmit ….. but we are >> dealing with *UDP* here, no bueno. >> >> >> >> To resolve, the customer had to identify and classify the traffic and >> force a active/failover transmission through the SD-WAN solution for that >> traffic, rather than a “load balance” transmission behavior. >> >> >> >> == Sleuthing & The Closing == >> >> >> >> In hind sight, seems simple and makes perfect sense right? However, when >> your only visibility into the network is the Expressway servers themselves, >> it can be *very* challenging to discover because at that point in the >> topology, everything looks like it is coming from and going to the VIP on >> the firewall pair. So how do you catch something like this when you can’t >> see everything? *PCAPs*. *Literally counting f**king packet sequence >> numbers for 6 hours and identifying a consistent pattern of packets coming >> out of order and being “lost”.* >> >> >> >> -Ryan- >> >> >> >> >> >> _______________________________________________ >> cisco-voip mailing list >> [email protected] >> https://puck.nether.net/mailman/listinfo/cisco-voip >> >> > _______________________________________________ > cisco-voip mailing list > [email protected] > https://puck.nether.net/mailman/listinfo/cisco-voip >
_______________________________________________ cisco-voip mailing list [email protected] https://puck.nether.net/mailman/listinfo/cisco-voip
