[GROW] RtgDir review: draft-ietf-grow-ops-reqs-for-bgp-error-handling

Geoff Huston Mon, 10 Sep 2012 22:33:49 -0700

Hello,

I have been selected as the Routing Directorate reviewer for this draft. The 
Routing Directorate seeks to review all routing or routing-related drafts as 
they pass through IETF last call and IESG review. The purpose of the review is 
to provide assistance to the Routing ADs. For more information about the 
Routing Directorate, please see 
http://www.ietf.org/iesg/directorate/routing.html


Although these comments are primarily for the use of the Routing ADs, it would 
be helpful if you could consider them along with any other IETF Last Call 
comments that you receive, and strive to resolve them through discussion or by 
updating the draft.

Document: draft-ietf-grow-ope-reqs-for-bgp-error-handling-05.txt 
Reviewer: Geoff Huston
Review Date: 11 September 2012 
IETF LC End Date: 13 September 2012 
Intended Status: Informational

Summary: 
I have some major concerns about this document that I think should be resolved 
before publication. I also have some minor concerns here that also relate to 
the manner of expression of these requirements. I do not have major concerns 
about the technical content of the requirements described in this document.

Comments: 
This document is not clearly written and difficult to understand. 

The requirements are scattered across voluminous text, which is unhelpful. I 
would've preferred to read a document which managed to enumerate the same 
requirements in under 8 pages of text, while the current count of 28 pages 
appears to be consumed by prolix and repetitive text that contributes neither 
to the precision of the description of the requirements, nor to the description 
of the rationale for the requirements.

At its current length, and with its density of expression and level of 
repetition I would suggest that's its utility to future readers is 
unfortunately compromised. This is a shame, as within this is a well-considered 
set of operational requirements for BGP error handling buried within this 
document.

Major Issues: 
There is a major issue here in terms of the overall readability and a lack in 
conciseness in expression and clear structuring of the subject material in an 
organised and coherent manner.

More specifically, I take issue with the classification approach used in 
Section 2, and I am of the opinion that it chould be rewritten to aid clarity 
and readability. I find it confusing to see "Critical" and "Semantic" error 
classifications. It would make more sense to me to call these categories 
"Critical" and "Non-Critical". I would also suggest to use this classification 
to define the proposed handling - i.e. Critical Errors are such that the BGP 
message framing has been lost, and it is necessary to restart the session or 
undertake some other error handling mechanism that would re-establish BGP 
message framing, and Non-Critical Errors are such that BGP message framing has 
NOT been lost, and the error recovery process can be managed though various 
forms of local actions and potentially some form of additional BGP 
protocol-level interaction that would not require a session tear-down to repair.

I am also at a loss to understand the role of section 3 in this requirements 
document. It appears to be making the case that the current BGP error handling 
approach is ill-suited to operational requirements and that different forms of 
error handling should be placed as requirements for the protocol. I would 
conventionally expect to see these arguments appear in section 1 of this 
document, as part of the argument for the motivation for a new set of error 
handling requirements. This is perhaps a specific instance of the previous 
mentioned issue that this document could benefit from some careful thought in 
the manner of the organisation of the presented material.

I also note that Section 6, Operational Toolset for Monitoring BGP, represents 
a scope creep for this document. My concern here is that any general comments 
about monitoring BGP would not normally be expected to be enumerated in a 
document that was intended to address the requirements for BGP's handling of 
error conditions. I am not aware whether the Working Group has considered the 
possibility of separately addressing error handling and operational monitoring 
in two operational requirements in distinct documents, but from my review of 
this document it does appear that a case can be made here for this form of 
clear delineation.

In any case, I would suggest that the document would benefit from a major 
revision that was focussed on  a clear enumeration of the requirements for 
error handling rather than the current document form of a somewhat less 
structured collection of comments on the BGP NOTIFICATION message and its 
current method of handling, comments on existing work in progress on error 
handling approaches and mechanisms and the inclusion of consideration of error 
handling scope, and the considerations behind re-interpretation of certain 
forms of erroneous UPDATES as implicit WITHDRAW messages. At present all these 
concepts have been added into the document in a manner that tends to blur the 
distinction between a description of the requirement itself and the motivation 
for this requirement.


Minor Issues: 
last sentence of the abstract: namely the "overview of a set of enhancements to 
BGP-4" is inconsistent with the document's purpose ass represented in the title 
("Requirements for Enhanced Error Handling Behaviour") or in later parts of the 
document. Needs revision.

Introduction: first sentence - "numerous incidents..." is imprecise and 
uninformative - perhaps dropping this adjective would help here. Also in this 
sentence I would suggest changing "due to the" to "as a consequence of the". 

Introduction: second sentence - "the deployments of the protocol have changed 
within modern networks" does not parse for me. Is this intending to say that 
some current implementatons of this protocol deviate from from the 
standards-defined behaviour?

Introduction: This entire section could be reduced by noting that "BGP's 
current error handling behaviour, as defined in RFC 4271, define a single error 
handling response, namely that of session reset. This response has significant 
impacts within an operational environment. This memo proposes a set of 
requirements for further refinement of the standard behaviour for error 
handling in BGP."

The reminder of the document would benefit considerably from a similar 
editorial pass. It is simply way too prolix and this detracts from the 
effectiveness of the document as a description of a set of requirements.

section 1.1, first sentence "... are designed to be conducive to this role" - 
frankly I have no idea what this means. Is it "consistent with this role"? But 
even then it makes no sense. Indeed the first sentence defeats me as to its 
intended purpose.

Section 1.1, second sentence - there is some jarring imprecision here that 
should be deleted - the "relatively small" amount of NLRI information makes no 
sense to me as I am unsure to what this "relative" comparison is being made.

Section 1.1, third sentence. This sentence, "In this case, it is the 
expectation.." is wordy and terribly expressed. I was thinking of ways to say 
this more concisely, but may be it would be better to remove it completely.
 
Section 1.1, last sentence. This sentence, about the expectation to be able to 
use sub-optimal paths is a bit of a martian for me - the concept is introduced 
here without warning and without context - I thought this was a requirement for 
error handling specification document, and this statement appears without clear 
context.

Section 1.1, second paragraph - "Traditional network architectures _use_ an..."

Section 1.1, second paragraph - the author is implying that the requirements 
for IGP and EGPs differ in terms of robustness. It would be helpful it this 
claim was substiantiated in some manner in so far as this reviewer does not see 
much of a difference at all - both protocols have a very high requirement for 
robust operation from this reviewer's perspective.

Section 1.1, third paragraph - yes, BGP carries more information, but the case 
that this augmented use provides justification for an altered error handling is 
weak, and in my view superfluous to the document's purpose. The previous 
paragraph provides adequate motivation and this third paragraph appears to be 
another repetition of the basic assertion that "BGP plays a critical role in 
network operation, and BGP error handling should not cause a hiatus in the 
supply of information provided by the operation of BGP.""

Section 1.1, fourth paragraph appears to be saying that: "BGP systems carry 
large volumes of information, and the time taken to recover from a 
error-triggered session reset is now a significant factor in terms of overall 
network robustness. Error handling approaches that limit the scope of error 
recovery to those NLRIs mentioned in the erroneous BGP UPDATE message should be 
considered within a requirement set for error handling.

It is possible to reduce section 1.1 to two paragraphs and a more concise set 
of statements about the problems that the current standards-defined error 
handling response pose to network operators.

Section 1.2 - here the first sentence is a restatement of document's purpose, 
already stated in the Abstract and in section 1 - there is no need to restate 
it here. The rest of this section is again very wordy. It may be worth 
considering a more concise restatement of these requirements, namely that error 
handling should avoid the use of session resets where possible, error handling 
should, where possible be limited in scope to those NLRI UPDATEs that can be 
associated with the error condition, and where session reset is considered to 
be unaviodable, various foprms of more graceful session restart should be 
considered. Furthermore, as a more general BGP requirement, the inclusion of 
mechanisms to allow for operational monitoring of BGP should be stated as an 
operational requirement.

Section 2 - I have trouble parsing the structure of this section - perhaps its 
because the first four paragraphs here are a more verbose repetition of the 
information presented in sections 2.1.1 and 2.1.2.

Section 3- paragraph 3 - I am confused by the purpose of the second half of 
this paragraph, starting with the sentence beginning with "It should, however, 
be considered if this view is valid..." The first half of the paragraph is 
discussing the "treat as withdraw" in the context of iBGP, but the second half 
of the paragraph does not appear to concludes this discussion.

Section 4 - paragraph 3 this is an example of an embedded "requirement" that 
should be avoided. It would be far clearer to pull out all these requirements 
and enumerate them and for each one outline concisely the rationale for the 
requirement and its intended effect on the operation of BGP. 

Section 4 - paragraph 4  contains another example of this embedded "requirement"

Section 5 - paragraph 2 - This sentence: "Clearly, there is some utility to 
this requirement, as error conditions in BGP are, in general, exited from."  
What does this mean? I am at a bit of a loss in reading section 5, as, once 
more, there are embedded "requirements" and a lot of repetition of material 
from earlier sections.


Nits: 
Abstract, para 1, sentence 3 - s/strict/standards-defined/ and s/message 
causing/message, causing/

Section 1.1, second paragraph ... "As such, BGP has become an IGP" is better 
expressed as "As such, iBGP has become an IGP"

Section 1.2 s/UPDATE packet/UPDATE message/

Section 2. first sentence - expand the first use of "DFZ"

Section 2 - why does this document use "BGP-4" and "BGP"? - please pick one 
term or the other. I suggest using "BGP" uniformly through the document.



_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow

[GROW] RtgDir review: draft-ietf-grow-ops-reqs-for-bgp-error-handling

Reply via email to