Ketan Talaulikar has entered the following ballot position for
draft-ietf-grow-bmp-bgp-rib-stats-14: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to 
https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/ 
for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-grow-bmp-bgp-rib-stats/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

Thanks to the authors and the WG for this document.

Please find below certain points that I would like to discuss.

<discuss-1> Semantics of routes, paths, primary, and backup.

Section 2 of this document says:
Primary route: A route to a prefix that is considered the best route by the BGP
decision process [RFC4271] and actively used for forwarding traffic to that
prefix. Backup route: A backup route is eligible for route selection, but it is
not selected as the primary route and is also installed in the Loc-RIB. It is
not used until all primary routes become unreachable. Backup routes are used
for fast convergence in the event of failures.

Consider an BGP route for destination prefix x/y is a multipath:
x/y via BGP NH1 (path1) (best)
    via BGP NH2 (path2) (multipath - say ECMP)
    via BGP NH3 (path3) (backup)
    via BGP NH4 (path4) (valid but not best/multipath/backup)
    via BGP NH5 (path5) (invalid - for whatsover reason)

This is a single route. The best/multipath/backup/valid/invalid/etc are
qualifiers of its paths. Except for two stats that refer to paths (stale and
suppressed), everything is referring to routes. I would like to discuss the
semantics of route vs path. It seems to me like some of the stats are for paths
and not routes.

In general, I think the use of the terms primary/backup which are related to
forwarding plane aspects can be confusing. Instead, perhaps using terms that
are more suitable for BGP Loc-RIB would be better? I've suggested some of them
above for consideration. Also refer to draft-ietf-grow-bmp-path-marking-tlv -
the terms of stats should be aligned across the BMP documents?

Furthermore, there is a wrong assumption that backup paths are only activated
when all primary paths are down. This is very much implementation dependent.
Some implementations have a 1:1 provisioning of primary/backup - where the
backup would get used when its specific primary goes down - this draws on the
FRR notion in the forwarding planes. Refer to the definition in
draft-ietf-grow-bmp-path-marking-tlv

These clarifications have implications on several of the stats as they are
defined currently.

<discuss-2> Section 3 has the following text and Section 4 introduces a table
that brings up an interesting aspect.

"This section defines different statistics type for Adj-RIB-In and Adj-RIB-Out
monitoring type. Some of these statistics are also applicable to Loc-RIB; refer
to Section 4 for more details."

For types 24 through 28, they are applicable for both Adj-RIB-In and Loc-RIB.
How does one know what is being reported? Can this be clarified? Seems like
this is the first document introducing such overloaded types but I don't find
the reason why this is being done. There is also a sort of duplication for same
stat being both global as well as per afi/safi - is there any guidance on
whether only one of them needs to be supported (this way avoiding the race
conditions and discrepancies in their totaling)?

It is important to clarify these aspects if this is going to set a
precedent/guidance for other similar stats in BMP in future documents?

<discuss-3> Section 5 - Operational considerations - is not entirely
operational considerations. There is reference to "implementations" in several
places and it is not clear if this is on the router side or the
collector/monitoring side - this needs to be clarified so that expectations on
either side implementations are clear.

As an example: "Implementations MUST track discontinuities and log this
information." - which side is this for?

Several aspects are not really operational consideration but implementation
considerations. Please consider a "Procedures" section for documenting some of
those aspects.

As an example, how is this text an operational consideration "Some statistics
are dependent on feature configurations, such as GR, LLGR, and RPKI, so the
corresponding statistics are only sent when these features are enabled. This
statistics include Type 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
39, 40, 41, 42, and 43."

Another example is "A BMP implementation MUST ignore unrecognized stat types
upon receipt and MUST exclude unsupported stat types upon transmission." ...
this is a normative protocol behavior that is burried in the Operational
Considerations section.

"Operators MAY consider rate-limiting statistic updates to minimize performance
impact on control-plane processes." - why is this not at least a SHOULD and
perhaps even a MUST?


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

I note that the WGLC for this document was not cross-posted to the IDR WG for
soliciting review as required by the GROW WG charter. I hope this can be
avoided going forward.

I support the DISCUSS positions of both Eric and Gunter. Some of their points
are related to the points that I have raised in my ballot as well.

I also have some comments/suggestions that I hope will help improve the
document.

1) Type = 37: (64-bit Gauge) Current number of routes in per-AFI/SAFI
post-policy Adj-RIB-In not found by verifying route origin AS number through
the ROA of RPKI [RFC6811]. The value is structured as: 2-byte AFI, 1-byte SAFI,
followed by a 64-bit Gauge.

The phrase 'not found by verifying ...' is confusing. I assume this refers to
routes that didn't find any match in the RPKI cache? If so, please clarify.
This also applies to type 43.

2) Type = 39: (64-bit Gauge) Current number of routes refused to be sent by
exceeding the maximum AS_PATH length supported by the local configuration.

The phrase 'refused to be sent ...' is confusing. Perhaps you mean routes that
were not sent because ... This also applies to type 40.



_______________________________________________
GROW mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to