Except in those (becoming less rare than hardware failure) instances where the 
software controlling the failover process is the actual cause of the outage.

Owen

On Jun 23, 2011, at 5:44 AM, -Hammer- wrote:

> Agreed. At an enterprise level, there is no need to risk extended downtime to 
> save a buck or two. Redundant hardware is always a good way to keep Murphy 
> out of the equation. And as far as hardware failures go, it's not that 
> common. Nowadays it's the bugs in overly complicated code on your gear that 
> get you first. I miss IOS 11.3.....
> 
> -Hammer-
> 
> 
> 
> On 06/23/2011 01:07 AM, Bret Palsson wrote:
>> That's fine if you are running a website. When it comes to 
>> telecommunications, a 15 minute outage is pretty huge. Especially with 
>> certain types of customers: emergency services for example.
>> 
>> -Bret
>> 
>> On Jun 23, 2011, at 12:02 AM, Hank Nussbacher wrote:
>> 
>>   
>>> At 20:42 22/06/2011 -0700, Jason Roysdon wrote:
>>> 
>>> Let me be a bit of a heretic here.  How often does your router fail?  Or 
>>> your firewall?  In the 25 years I have gone into customers I have found 
>>> when they did a cross setup as proposed below by Bret and Jason, only one 
>>> person truly knew the complete setup and if something broke only he was 
>>> able to fix it.  There is never complete printed documentation: routing 
>>> design, IPs on all interfaces, subnetting schematic, etc.  And if there was 
>>> at one point, after 2 years it was outdated and never updated and only the 
>>> *1* guy knew the changes in his head.
>>> 
>>> In that kind of situation, when something stopped working they always had 
>>> to call in the "guru" to fix it.  On the other hand, a simple design of 
>>> only *one* path (pick either left or right side of each of the ASCII arts), 
>>> made it possible that even junior network engineers as well as technicians 
>>> called in on emergency with 4 hours notice, were able to fix the situation 
>>> much more quickly than the "cross" design.  And the MTBF on a single path 
>>> solution, IMHO, is around 3-4 years.  And if you need redundancy, keep a 
>>> spare box on a shelf, completely loaded with the latest config so that it 
>>> can be hot-swapped in within 15 minutes of failure.
>>> 
>>> This 1-path design is not for everyone.  The vendors always recommend the 
>>> "cross" design since they sell 2x the amount of boxes but I have found that 
>>> life works fine with just a 1-path design as well.
>>> 
>>> -Hank
>>> 
>>> 
>>>     
>>>> I second the static routes, specially from a simplicity standpoint.  Add
>>>> in a pair of layer two switches to simplify further:
>>>> 
>>>> 
>>>>     +--------+    +--------+
>>>>     | Peer A |    | Peer A |<-Many carriers. Using 1 carrier
>>>>     +---+----+    +----+---+    for this scenario.
>>>>         |eBGP          | eBGP
>>>>         |              |
>>>>     +---+----+iBGP+----+---+
>>>>     | Router +    + Router |<- Routers. Not directly connected
>>>>     +-+------+    +------+-+
>>>>       |                  |
>>>>     +-+------+    +------+-+
>>>>     |L2Switch|----|L2Switch|<- Layer 2 switches, can be stacked
>>>>     +--------+    +--------+
>>>>       |                  |
>>>>     +-+------+    +------+-+
>>>>     |Act. FW |----|Pas. FW |<-Firewalls Active/Passive.
>>>>     +--------+    +--------+
>>>> 
>>>> You can lose all of the left leg, or all of the right leg, and still be
>>>> up.  If you want to complicate things, you can add crossing links
>>>> between it all, but again, beyond BGP and VRRP, this is a very simple
>>>> design you can easily troubleshoot at 3AM.  It's also much easier to
>>>> document the troubleshooting steps (so you can go on vacation and
>>>> someone else can solve without calling you) and test upgrades.
>>>> 
>>>> You can nearly evenly split the traffic by having a VRRP VIP on each
>>>> edge router, with the other router backing up the first.  The firewalls
>>>> can have two static routes, one to each VIP, and this will roughly
>>>> load-balance the traffic out on a packet basis.  As you peer with the
>>>> same ISP, this will work just fine.  If they have an outage, your edge
>>>> routers will learn, and even if the circuit drops it'll know, and
>>>> basically the VIP will just redirect traffic to the other router.
>>>> 
>>>> Now all your firewalls have to do is maintain stateful session
>>>> information, not OSPF.
>>>> 
>>>> If you had two different ISPs (especially if they are not roughly evenly
>>>> connected), then not having intelligence of the BGP paths in your
>>>> firewalls can cause an extra hop when it hits router with the longer
>>>> path, which will redirect it to the router with the shorter path.
>>>> 
>>>> Speaking from a Cisco/HSRP point of view, you could be more intelligent
>>>> (re:more complicated, and complication means harder troubleshooting and
>>>> more documentation needed) during problem periods by having the VIP move
>>>> routers automatically based on the WAN link dropping and/or a route
>>>> beyond it being lost (others can comment to if VRRP supports this).
>>>> This would save one hop to the "broken" router when the BGP path or WAN
>>>> is down.
>>>> 
>>>> Jason Roysdon
>>>> 
>>>> On 06/22/2011 06:07 PM, Bret Palsson wrote:
>>>>       
>>>>> On Wed, Jun 22, 2011 at 5:33 PM, PC<[email protected]>  wrote:
>>>>> 
>>>>>         
>>>>>> Who makes the firewall?
>>>>>> 
>>>>>> 
>>>>>>           
>>>>> Juniper SSG. We use NSRP and replicate all the RTOs. We have hitless on 
>>>>> the
>>>>> Firewalls, have for years. We're now peering with our own carriers vs. 
>>>>> using
>>>>> our datacenter's mix.
>>>>> 
>>>>> A static route from the junipers to the VIP (VRRP) is probably the way to
>>>>> go. I think.
>>>>> 
>>>>> To make this work and be "hitless", your firewall vendor must support
>>>>>         
>>>>>> stateful replication of routing protocol data (including OSPF).  For
>>>>>> example, Cisco didn't support this in their ASA product until version 
>>>>>> 8.4 of
>>>>>> code.
>>>>>> 
>>>>>> Otherwise, a failover requires OSPF to re-converge -- and quite frankly,
>>>>>> will likely cause some state of confusion on the upstream OSPF peers, 
>>>>>> loss
>>>>>> of adjacency, and a loss of routing until this occurs.  It's like someone
>>>>>> just swapped a router with the same IP  to the upstream device -- 
>>>>>> assuming
>>>>>> your active/standby vendor's implementation only presents itself as one
>>>>>> device.
>>>>>> 
>>>>>> However, once this is succesful your current failover topology should 
>>>>>> work
>>>>>> fine -- even if it takes some time to failover.
>>>>>> 
>>>>>> In my opinion though, unless the firewall is serving as "transit" to
>>>>>> downstream routers or other layer 3 elements, and you need to run OSPF 
>>>>>> to it
>>>>>> (And through it) as a result, it's often just easier to static default 
>>>>>> route
>>>>>> out from the firewall(s) and redistribute a static route on the upstream
>>>>>> routers for the subnets behind the firewalls.  It also helps ensure
>>>>>> symmetrical traffic flows, which is important for stateful firewalls and 
>>>>>> can
>>>>>> become moderatly confusing when your firewalls start having many 
>>>>>> interfaces.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jun 22, 2011 at 4:27 PM, Bret Palsson<[email protected]>  wrote:
>>>>>> 
>>>>>>           
>>>>>>> Here is my current setup in ASCII art. (Please view in a fixed width
>>>>>>> font.) Below the art I'll write out the setup.
>>>>>>> 
>>>>>>> 
>>>>>>>     +--------+    +--------+
>>>>>>>     | Peer A |    | Peer A |<-Many carriers. Using 1 carrier
>>>>>>>     +---+----+    +----+---+    for this scenario.
>>>>>>>         |eBGP          | eBGP
>>>>>>>         |              |
>>>>>>>     +---+----+iBGP+----+---+
>>>>>>>     | Router +----+ Router |<-Netiron CERs Routers.
>>>>>>>     +-+------+    +------+-+
>>>>>>>       |A   `.P    A.'    |P<-A/P indicates Active/Passive
>>>>>>>       |      `.  .'      |      link.
>>>>>>>       |        ::        |
>>>>>>>     +-+------+'  `+------+-+
>>>>>>>     |Act. FW |    |Pas. FW |<-Firewalls Active/Passive.
>>>>>>>     +--------+    +--------+
>>>>>>> 
>>>>>>> 
>>>>>>> To keep this scenario simple, I'm multihoming to one carrier.
>>>>>>> I have two Netiron CERs. Each have a eBGP connection to the same peer.
>>>>>>> The CERs have an iBGP connection to each other.
>>>>>>> That works all fine and dandy. Feel free to comment, however if you 
>>>>>>> think
>>>>>>> there is a better way to do this.
>>>>>>> 
>>>>>>> Here comes the tricky part. I have two firewalls in an Active/Passive
>>>>>>> setup. When one fails the other is configured exactly the same
>>>>>>> and picks up where the other left off. (Yes, all the sessions etc. are
>>>>>>> actively mirrored between the devices)
>>>>>>> 
>>>>>>> I am using OSPFv2 between the CERs and the Firewalls. Failover works 
>>>>>>> just
>>>>>>> fine, however when I fail an OSPF link that has the active default 
>>>>>>> route,
>>>>>>> ingress traffic still routes fine and dandy, but egress traffic doesn't.
>>>>>>> Both Netiron's OSPF are setup to advertise they are the default route.
>>>>>>> 
>>>>>>> What I'm wondering is, if OSPF is the right solution for this. How do
>>>>>>> others solve this problem?
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Bret
>>>>>>> 
>>>>>>> 
>>>>>>> Note: Since lately ipv6 has been a hot topic, I'll state that after we 
>>>>>>> get
>>>>>>> the BGP all figured out and working properly, ipv6 is our next project. 
>>>>>>> :)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>             
>>>>>>           
>>>>>         
>>>     
>> 
>>   


Reply via email to