Rather late reply on this topic.. But I happen to agree with Jared.
I did start out thinking that the bgp error handling idea was a new
twist of a solution that was very interesting but after all this debate
and my own introspection, I am actually back in the camp of not believing
in the idea of soft corrections in the face of malformed updates and such
situations. I for one have not seen a lot of real data from the SPs and
other newerusers of BGP in my dealings of many years that reflect the need for
such solutions in the real world. I do remain convinced that this is
still a very interesting and challenging and 's..y' software and protocol
development point but from a practical point of view I am not sure.
I have also seen discussions in development teams about how to structure
code to handle such treat-as-withdraw scenarios and it seems to me
that it is very possible that it might only make code more fragile than robust.
If only I had a penny for every time I thought I had written bug free
automatic bug handling code... :)
0.02,
Chandra.
________________________________
From: Jared Mauch <[email protected]>
To: Michael Long <[email protected]>
Cc: [email protected]; [email protected]; Tony Li <[email protected]>
Sent: Thursday, January 3, 2013 11:58 AM
Subject: Re: [Idr] [GROW] I-D Action:
draft-ietf-grow-ops-reqs-for-bgp-error-handling-06.txt
On Jan 3, 2013, at 2:35 PM, Michael Long wrote:
>
> On Jan 3, 2013, at 10:00 AM, Tony Li <[email protected]> wrote:
>>
>>
>> All of the marketing that you're doing here is positioning this as a
>> 'solution'. It's not. Yes, it will stop the flap, but it does NOTHING to
>> fix or deal with the underlying bug. All it does is gloss it over, and as
>> such, it will have implications in the field whereby this papers over real
>> bugs and we have now promoted BGP errors into RIB errors. That's NOT making
>> things easier to debug, that's just applying a band-aid.
>
> I understand what you are saying and I agree 100%, however, from an my
> operations perspective the "fix" is the same. Either upgrade to fixed code or
> policy out the offending announcement. I would rather deal with a customer
> routing issue vs a frantic call from our noc saying 15+ att peers globally
> are bouncing. The latter being a much bigger impact on our network.
I'm very concerned with the case of ignoring a route update and having a
month-long discussion about why some route is missing from the $carrier_a
network when it's being sent from $carrier_b and they show it going out just
fine.
You don't know there's an issue until someone reports it and your long-tail to
problem resolution takes forever.
> I can live with a couple of /24's not working for a few customers. I can't
> have 15+ peers bouncing because of bad updates and even more peers bouncing
> because of missed keepalives due to cpu pegged trying to deal with 15 peers
> bouncing globally.
While related, this is an implementation defect on the part of vendors and
their poorly optimized TCP and BGP implementations being unable to get their
basic job done. I recall vendors blaming our "slow" system CPU then finally
fixing their logic defect that always returned 1 or 0 when it thought it was
idle. (sometimes those if statements look really complex).
>> A more constructive way to address the real problem here would be to talk
>> about whether we should even re-establish the session after an error. Long
>> ago, we made an implementation decision to simply retry. That would seem to
>> be the real issue at hand.
>
> I would back this provided adequate logging as to why the session is down. It
> would be much like tripping max-prefixes where we could hard clear a single
> single session for debug. I could live with this.
I certainly agree there needs to be better logging from the vendors.
I remain convinced that attempts to address this problem will create more
complex situations vs provide the desired result of a stable BGP core.
- Jared
_______________________________________________
Idr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/idr
_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow