Gunter Van de Velde has entered the following ballot position for
draft-ietf-grow-bmp-bgp-rib-stats-14: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to 
https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/ 
for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-grow-bmp-bgp-rib-stats/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

# Gunter Van de Velde, RTG AD, comments for draft-ietf-grow-bmp-bgp-rib-stats-14

# The line numbers used are rendered from IETF idnits tool:
https://author-tools.ietf.org/api/idnits?url=https://www.ietf.org/archive/id/draft-ietf-grow-bmp-bgp-rib-stats-14.txt

# Many thanks for the RTGDIR review from Bruno and the shepherd writeup from
Job.

# Did i miss seeing a cross posting to IDR/BESS to understand if the various
suggested gauges definitions are accurately described and understood from
protocol perspective.

# DISCUSS
# =======

#1# the section "5.  Operational Considerations" seems to document a mix of
operational considerations (non BCP14 language required) and GMP protocol
formal procedures (BCP14 language is required). Can these two be untangled. It
will make it easier for implementors to do the correct implementation.

#2# In general i found the descriptions of most of the gauges for the newly
proposed statistics types not very accurately described. See my ""COMMENT""
section for input and overview. Too lengthy in the overview DISCUSS section

#3# some gauges seem duplicates from prior existing gauges. Not sure we need
two times the same gauge in different code-points. seems sub-optimal and error
prone.

#4# section 5 is named "Statistics Definition" and that seems not aa well
described title. Can this be something that better describes the content? for
example "RIB monitoring type statistics"

#5# it was unclear to me that what the document specifies is that the gauge
that is formalized in this document is not simply a single dimensional gauge
alone, but that the value transferred by BMP is a combination of "AFI + SAFI +
gauge". I think i missed seeing that explicitly mentioned in the document. 
Adding lengths (in general introduction section maybe to avoid repetition) of
each field would help making sure implementations interop well.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

# comments
# ========

19         This document defines new statistics type to monitor BMP Adj-RIB-In
20         and Adj-RIB-Out Routing Information Bases (RIBs).

GV> in the abstract is mentioned that the document defines new statistics (but
later is mentioned it are guages)

86         This document defines new gauges for BMP statistics message.

GV> The above does not fully align with what is written in the abstract, I
suspect you want to say:

"
This document defines gauges for new BMP statistics messages.
"

107        *  Pre-policy Adj-RIB-In: The result before applying the inbound
108           policy to an Adj-RIB-In.  Note that this aligns with the pre-
109           policy Adj-RIB-In concept specified in Section 2 of [RFC7854].

GV> Why is the text from RFC7854 not re-used? is there need for a new explicit
definition? GV> RFC7854 says:

"
   o  Adj-RIB-In: As defined in [RFC4271], "The Adj-RIBs-In contains
      unprocessed routing information that has been advertised to the
      local BGP speaker by its peers."  This is also referred to as the
      pre-policy Adj-RIB-In in this document.
"

127        *  Primary route: A route to a prefix that is considered the best
128           route by the BGP decision process [RFC4271] and actively used for
129           forwarding traffic to that prefix.

GV> is this accurate? is it not the BGP route that is selected by BGP for being
forwarded to its peers? There may be ECMP or uECMP routes actively used

131        *  Backup route: A backup route is eligible for route selection, but
132           it is not selected as the primary route and is also installed in
133           the Loc-RIB.  It is not used until all primary routes become
134           unreachable.  Backup routes are used for fast convergence in the
135           event of failures.

GV> here is the concept of "all primary routes" used, indicating more as a
single best route. Is this not contradicting the prior bullet point?

137     3.  Statistics Definition

GV> This title seems rather undescriptive. What about calling this section:

"
RIB monitoring type statistics
"

145        *  Type = 18: (64-bit Gauge) Current number of routes in pre-policy
146           Adj-RIB-In.  This gauge is similar to stats type 7 defined in
147           [RFC7854] and makes it explicitly for the pre-policy Adj-RIB-In.

GV> It is written that this is similar as stats type 7, but when looking at the
definitions in section 2 it is exactly the same. pre-existing stats type 7 is
exactly the same as the proposed stats type 18. Do we need type 18?

149        *  Type = 19: (64-bit Gauge) Current number of routes in per-Address
150           Family Identifier (AFI)/Subsequent Address Family Identifier
151           (SAFI) pre-policy Adj-RIB-In.  This gauge is similar to stats type
152           9 defined in Section 4.8 of [RFC7854] and makes it explicitly for
153           the pre-policy Adj-RIB-In.  The value is structured as: 2-byte
154           AFI, 1-byte SAFI, followed by a 64-bit Gauge.

GV> same observation as the prior item. The newly suggested type 19 is exactly
the same as type 9. Do we need this new gauge? GV> what exactly is the "value"?
Can the structure of the field be more clarified? how how is the field encoded?
it seems more as a single dimensional 64 bit gauge.

GV> first time usage of the AFI/SAFI in this document and adding a reference
can be handy. Also maybe a list of AFI/SAFI this is intended for if this is
only for a subset of them.

159        *  Type = 21: (64-bit Gauge) Current number of routes in per-AFI/SAFI
160           post-policy Adj-RIB-In.  The value is structured as: 2-byte AFI,
161           1-byte SAFI, followed by a 64-bit Gauge.

GV> what exactly is the "value"? Can the structure of the field be more
clarified? how how is the field encoded? it seems more as a single dimensional
64 bit gauge.

163        *  Type = 22: (64-bit Gauge) Current number of routes in per-AFI/SAFI
164           rejected by inbound policy.  This gauge is different from stats
165           type 0 defined in Section 4.8 of [RFC7854].  The stats type 0 is a
166           32-counter which is a monotonically increasing number and doesn't
167           represent the current number of routes rejected by an inbound
168           policy due to ongoing configuration changes.  The value is
169           structured as: 2-byte AFI, 1-byte SAFI, followed by a 64-bit
170           Gauge.

GV> If over time more and more routes are rejected, then how can the number of
rejected routes go ever go down? its an increasing number. Unless there is
assumption that there is accounting for the changing number of routes
received/withdrawn by a peer and it is the number of routes that were rejected
from the number of routes received. This may need more accurate definition of
what exactly is being measured and what reference is used.

172        *  Type = 23: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
173           accepted by inbound policy.  The value is structured as: 2-byte
174           AFI, 1-byte SAFI, followed by a 64-bit Gauge.  Some
175           implementations, or configurations in implementations, may discard
176           routes that do not match policy and thus the accepted count (type
177           23) and the Adj-RIB-In counts (type 21) will be identical in such
178           cases.

GV> not sure what is the text starting with "Some implementations, or ..." helps
with the formal definition of the field. It is useful from operational
perspective, but it convolutes the formal part of the definition of the field
itself. Maybe move to operational implication section

180        *  Type = 24: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
181           selected as primary route.  The value is structured as: 2-byte
182           AFI, 1-byte SAFI, followed by a 64-bit Gauge.

GV> the primary route is the route forwarding traffic? does this include all
ECMP and uECMP paths. BGP will only fwd the best BGP Path, but it may use more
as a single path for forwarding

184        *  Type = 25: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
185           selected as a backup route.  The value is structured as: 2-byte
186           AFI, 1-byte SAFI, followed by a 64-bit Gauge.

GV> does this include all routes that are not the BGP best path or only the
routes that are not used for forwarding? What makes a route a "backup" route.

195        *  Type = 27: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
196           marked as stale by Graceful Restart (GR) events.  The value is
197           structured as: 2-byte AFI, 1-byte SAFI, followed by a 64-bit
198           Gauge.  'Stale' refers to a path which has been declared stale by
199           the BGP GR mechanism as described in Section 4.1 of [RFC4724].

GV> GR events happen when a CPM moves from a primary unit to a standby
unit/process. Such involves significant processing. Hence i wonder how mush
operational value this brings, or if would make the GR event worse then it
already is.

201        *  Type = 28: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
202           marked as stale by Long-Lived Graceful Restart (LLGR).  The value
203           is structured as: 2-byte AFI, 1-byte SAFI, followed by a 64-bit
204           Gauge.  'Stale' refers to a path which has been declared stale by
205           the BGP LLGR mechanism as described in Section 4.3 of [RFC9494].

GV> see prior comments

211        *  Type = 30: (64-bit Gauge) Current Number of routes per-AFI/SAFI
212           left until reaching the received route threshold which corresponds
213           to the upper bound of accepted routes per Section 6.7 of
214           [RFC4271].  The value is structured as: 2-byte AFI, 1-byte SAFI,
215           followed by a 64-bit Gauge.

GV> Is this accurate? multiprotocol extensions are described in RFC4760 and not
in RFC4271. It is unclear how this counter referencing rfc4271 is to be applied
to rfc4760 when multiple afi/safi may be received from a single peer.

217        *  Type = 31: (64-bit Gauge) Current Number of routes left until
218           reaching a license-customized route threshold.  This value is
219           affected by whether a customized license exists, and when the
220           customized license is installed.

GV> This may be a soft threshold and in addition may be enforced outside the
router knowledge.

222        *  Type = 32: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
223           left until reaching a license-customized route threshold.  This
224           value is affected by whether a customized license exists for the
225           relevant address family, and when the customized license is
226           installed.  The value is structured as: 2-byte AFI, 1-byte SAFI,
227           followed by a 64-bit Gauge.

GV> This may be a soft threshold and in addition may be enforced outside the
router knowledge.

264        *  Type = 39: (64-bit Gauge) Current number of routes refused to be
265           sent by exceeding the maximum AS_PATH length supported by the
266           local configuration.

GV> can this be more accurate described? Is it "refused to be sent" or simply
"not sent" because route AS_PATH is longer as max AS_PATH length towards the
peer?

268        *  Type = 40: (64-bit Gauge) Current number of routes in per-AFI/SAFI
269           refused to be sent by exceeding the maximum AS_PATH length
270           supported by the local configuration.  The value is structured as:
271           2-byte AFI, 1-byte SAFI, followed by a 64-bit Gauge.

GV> See prior comment. I do not think 'refused' is the most accurate word to
use.... maybe filtered is a better term to use?

328     5.  Operational Considerations

GV> Some of the definitions earlier have operational concerns included and are
maybe better added to the operational implication section.

330        This document defines new gauges for BMP statistics messages.  The

GV> i think more accurate would be that the document specifies gauges for "new
BMP statistics".

333        implementation-dependent.  Implementations SHOULD determine
334        appropriate report generation and delivery strategies, including
335        configurable timing intervals and threshold values.  The mechanism
336        for controlling the reporting of new gauges SHOULD be consistent with
337        that of existing types.  Implementations SHOULD also support per-
338        router configuration of statistic subsets for collection and
339        reporting.

GV> Why is this uppercase SHOULD? Is there a procedure that breaks? lowercase
seems sufficient as its documenting good behavior.

341        Some statistics are dependent on feature configurations, such as GR,
342        LLGR, and RPKI, so the corresponding statistics are only sent when
343        these features are enabled.  This statistics include Type 24, 25, 26,

GV> From operational perspective sending BGP Stats during a GR may impact the
GR event due to additional processing and dynamics. That is an operational
concern.

351        Certain statistics may have logical relationships (e.g., per-AFI/SAFI
352        counts summing to global totals).  Implementations MAY perform
353        consistency checks but MUST NOT assume strict dependencies (due to
354        potential race conditions or partial failures).  Discrepancies (e.g.,
355        sum(per-AFI/SAFI) != global count) SHOULD be logged as warnings but
356        MUST NOT disrupt protocol operation.

GV> not convinced these need to be BCP14 language. is BCP14 language required?

358        For backward compatibility, and absent policy otherwise, it is
359        RECOMMENDED that monitored routers capable of generating both (Type 7
360        and Type 18) or (Type 9 and Type 19) BMP statistics SHOULD transmit
361        both corresponding types simultaneously.  This allows monitoring
362        stations to process either format according to their needs without
363        disrupting existing implementations that rely on Type 7 or Type 9.

GV> In what way are The new types different from the prior types. its the exact
same value representing the exact same property.

369        Counters may reset due to session restart, manual clearance, or
370        overflow.  Implementations MUST track discontinuities and log this
371        information.

GV> This document specifies gauges, not counters. Is this accurate usage of
words? is BCP14 language correct? seems not to be about formal protocol
procedure

373        Operators MAY consider rate-limiting statistic updates to minimize
374        performance impact on control-plane processes.  Operators SHOULD
375        enable only necessary statistics to reduce memory and CPU overhead.

GV> lowercase should/may seems sufficient

Many thanks for this document,

Kind Regards,
Gunter Van de Velde
RTG Area Director



_______________________________________________
GROW mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to