Hi Gunter

Sharing my inline response on some of the comments below as [MS].

Thanks
Mukul
From: Gunter Van de Velde via Datatracker <[email protected]>
Date: Friday, November 14, 2025 at 11:34 AM
To: The IESG <[email protected]>
Cc: [email protected] 
<[email protected]>, [email protected] 
<[email protected]>, [email protected] <[email protected]>, [email protected] 
<[email protected]>
Subject: Gunter Van de Velde's Discuss on draft-ietf-grow-bmp-bgp-rib-stats-14: 
(with DISCUSS and COMMENT)

Gunter Van de Velde has entered the following ballot position for
draft-ietf-grow-bmp-bgp-rib-stats-14: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to 
https://urldefense.com/v3/__https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/__;!!NEt6yMaO-gk!HrcolkTY5nVCvuTYRwujR1l5Uhw5AXsD__8s8nfxS2eyiJKMYJAnEZwS-Zvr10n9IMw8X4Etpsob$
for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-ietf-grow-bmp-bgp-rib-stats/__;!!NEt6yMaO-gk!HrcolkTY5nVCvuTYRwujR1l5Uhw5AXsD__8s8nfxS2eyiJKMYJAnEZwS-Zvr10n9IMw8X1T7zVNz$



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

# Gunter Van de Velde, RTG AD, comments for draft-ietf-grow-bmp-bgp-rib-stats-14

# The line numbers used are rendered from IETF idnits tool:
https://urldefense.com/v3/__https://author-tools.ietf.org/api/idnits?url=https:**Awww.ietf.org*archive*id*draft-ietf-grow-bmp-bgp-rib-stats-14.txt__;Ly8vLy8!!NEt6yMaO-gk!HrcolkTY5nVCvuTYRwujR1l5Uhw5AXsD__8s8nfxS2eyiJKMYJAnEZwS-Zvr10n9IMw8X5rdqMnz$

# Many thanks for the RTGDIR review from Bruno and the shepherd writeup from
Job.

# Did i miss seeing a cross posting to IDR/BESS to understand if the various
suggested gauges definitions are accurately described and understood from
protocol perspective.

# DISCUSS
# =======

#1# the section "5.  Operational Considerations" seems to document a mix of
operational considerations (non BCP14 language required) and GMP protocol
formal procedures (BCP14 language is required). Can these two be untangled. It
will make it easier for implementors to do the correct implementation.
[MS] I am not clear what needs to be untangled and how do we want to word this.

#2# In general i found the descriptions of most of the gauges for the newly
proposed statistics types not very accurately described. See my ""COMMENT""
section for input and overview. Too lengthy in the overview DISCUSS section

#3# some gauges seem duplicates from prior existing gauges. Not sure we need
two times the same gauge in different code-points. seems sub-optimal and error
prone.

#4# section 5 is named "Statistics Definition" and that seems not aa well
described title. Can this be something that better describes the content? for
example "RIB monitoring type statistics"
[MS] I feel "Statistics Definition” is an appropriate generic section title. 
This is following by two sub-titles - “Adj-RIB-In Statistics Definition” and 
“Adj-RIB-Out Statistics Definition" which allies well with "Statistics 
Definition” title.

#5# it was unclear to me that what the document specifies is that the gauge
that is formalized in this document is not simply a single dimensional gauge
alone, but that the value transferred by BMP is a combination of "AFI + SAFI +
gauge". I think i missed seeing that explicitly mentioned in the document.
Adding lengths (in general introduction section maybe to avoid repetition) of
each field would help making sure implementations interop well.
[MS] A gauge is a numeric (64 bit integer) value. The "AFI + SAFI” is the 
additional encoding that goes in the BMP stats data. As mentioned in RFC 7854, 
BMP statistics message is encoded like this -

  *
BMP header + BMP peer header + Stats count.
  *
The stats count is a TLV. (Stats Type, Stats length, Value —> Stats Data)
  *
The Stats Data is being encoded with  "AFI + SAFI”,  + “64 bit gauge”. This is 
being referred as “Value” in the doc.

Note that the wording is same as BMP rib-out RFC 8671. Also used in BMP loc-rib 
RFC 9069.
A BMP background is probably assumed in this draft.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

# comments
# ========

19         This document defines new statistics type to monitor BMP Adj-RIB-In
20         and Adj-RIB-Out Routing Information Bases (RIBs).

GV> in the abstract is mentioned that the document defines new statistics (but
later is mentioned it are guages)

86         This document defines new gauges for BMP statistics message.

GV> The above does not fully align with what is written in the abstract, I
suspect you want to say:

"
This document defines gauges for new BMP statistics messages.
"
[MS] The abstract is usually a high-level thing, so it is mentioned like 
“statistics". It could be a “32 bit count” or a “64 bit gauge”. The statistics 
definition clarifies that is is a “gauge”. We can update doc as you said - 
"This document defines gauges for new BMP statistics messages.”

107        *  Pre-policy Adj-RIB-In: The result before applying the inbound
108           policy to an Adj-RIB-In.  Note that this aligns with the pre-
109           policy Adj-RIB-In concept specified in Section 2 of [RFC7854].

GV> Why is the text from RFC7854 not re-used? is there need for a new explicit
definition? GV> RFC7854 says:

"
   o  Adj-RIB-In: As defined in [RFC4271], "The Adj-RIBs-In contains
      unprocessed routing information that has been advertised to the
      local BGP speaker by its peers."  This is also referred to as the
      pre-policy Adj-RIB-In in this document.
"
[MS] I think this was a specific comment from Paulo & others to make it 
explicit and say pre-policy. RFC 7854 defines Adj-rib-in and says it is 
referred as pre-policy Adj-rib-in. I feel, it is ok to have this and 
implementation may continue using stats type 7 if that deems appropriate.

127        *  Primary route: A route to a prefix that is considered the best
128           route by the BGP decision process [RFC4271] and actively used for
129           forwarding traffic to that prefix.

GV> is this accurate? is it not the BGP route that is selected by BGP for being
forwarded to its peers? There may be ECMP or uECMP routes actively used

131        *  Backup route: A backup route is eligible for route selection, but
132           it is not selected as the primary route and is also installed in
133           the Loc-RIB.  It is not used until all primary routes become
134           unreachable.  Backup routes are used for fast convergence in the
135           event of failures.

GV> here is the concept of "all primary routes" used, indicating more as a
single best route. Is this not contradicting the prior bullet point?

[MS} - I think we need some clarification here and we can update doc if there 
is an agreement.
My understanding is that for a given prefix, there can be only one route marked 
is active route.  ECMP is applicable for forwarding layer only. When ECMP is 
present, the active route can have multiple next-hop (ECMP) installed in FIB to 
forward the traffic. From this BMP statistics draft POV, to keep things 
generic, I would suggest to define a primary route as “A route that is marked 
as active by local BGP protocol". Backup path is all paths that are "not 
primary route".  When we bring in  forwarding concepts, things might get 
confusing.

137     3.  Statistics Definition

GV> This title seems rather undescriptive. What about calling this section:

"
RIB monitoring type statistics
"

145        *  Type = 18: (64-bit Gauge) Current number of routes in pre-policy
146           Adj-RIB-In.  This gauge is similar to stats type 7 defined in
147           [RFC7854] and makes it explicitly for the pre-policy Adj-RIB-In.

GV> It is written that this is similar as stats type 7, but when looking at the
definitions in section 2 it is exactly the same. pre-existing stats type 7 is
exactly the same as the proposed stats type 18. Do we need type 18?
[MS] As mentioned before, this was done based on explicit comment from Paulo 
and others. I have mentioned my other thoughts above about this topic.

149        *  Type = 19: (64-bit Gauge) Current number of routes in per-Address
150           Family Identifier (AFI)/Subsequent Address Family Identifier
151           (SAFI) pre-policy Adj-RIB-In.  This gauge is similar to stats type
152           9 defined in Section 4.8 of [RFC7854] and makes it explicitly for
153           the pre-policy Adj-RIB-In.  The value is structured as: 2-byte
154           AFI, 1-byte SAFI, followed by a 64-bit Gauge.

GV> same observation as the prior item. The newly suggested type 19 is exactly
the same as type 9. Do we need this new gauge? GV> what exactly is the "value"?
Can the structure of the field be more clarified? how how is the field encoded?
it seems more as a single dimensional 64 bit gauge.
[MS] Same comment as above. The “value” is the “Value —> Stats Data” mentioned 
in the BMP statistics message encoding explained above.

GV> first time usage of the AFI/SAFI in this document and adding a reference
can be handy. Also maybe a list of AFI/SAFI this is intended for if this is
only for a subset of them.
[MS] This is all AFI/SAFI that a BGP peer supports. There is no list that needs 
to be mentioned here.

159        *  Type = 21: (64-bit Gauge) Current number of routes in per-AFI/SAFI
160           post-policy Adj-RIB-In.  The value is structured as: 2-byte AFI,
161           1-byte SAFI, followed by a 64-bit Gauge.

GV> what exactly is the "value"? Can the structure of the field be more
clarified? how how is the field encoded? it seems more as a single dimensional
64 bit gauge.
[MS] The “Value” is explained above.

163        *  Type = 22: (64-bit Gauge) Current number of routes in per-AFI/SAFI
164           rejected by inbound policy.  This gauge is different from stats
165           type 0 defined in Section 4.8 of [RFC7854].  The stats type 0 is a
166           32-counter which is a monotonically increasing number and doesn't
167           represent the current number of routes rejected by an inbound
168           policy due to ongoing configuration changes.  The value is
169           structured as: 2-byte AFI, 1-byte SAFI, followed by a 64-bit
170           Gauge.

GV> If over time more and more routes are rejected, then how can the number of
rejected routes go ever go down? its an increasing number. Unless there is
assumption that there is accounting for the changing number of routes
received/withdrawn by a peer and it is the number of routes that were rejected
from the number of routes received. This may need more accurate definition of
what exactly is being measured and what reference is used.
[MS] - The rejected route can change based on policy configuration. RIB-in is 
associated with import policy. While RIB-out is associated with export policy. 
So we are measuring the effect of policy configuration.

172        *  Type = 23: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
173           accepted by inbound policy.  The value is structured as: 2-byte
174           AFI, 1-byte SAFI, followed by a 64-bit Gauge.  Some
175           implementations, or configurations in implementations, may discard
176           routes that do not match policy and thus the accepted count (type
177           23) and the Adj-RIB-In counts (type 21) will be identical in such
178           cases.

GV> not sure what is the text starting with "Some implementations, or ..." helps
with the formal definition of the field. It is useful from operational
perspective, but it convolutes the formal part of the definition of the field
itself. Maybe move to operational implication section
[MS] The text was added as part of a review comment.

180        *  Type = 24: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
181           selected as primary route.  The value is structured as: 2-byte
182           AFI, 1-byte SAFI, followed by a 64-bit Gauge.

GV> the primary route is the route forwarding traffic? does this include all
ECMP and uECMP paths. BGP will only fwd the best BGP Path, but it may use more
as a single path for forwarding

184        *  Type = 25: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
185           selected as a backup route.  The value is structured as: 2-byte
186           AFI, 1-byte SAFI, followed by a 64-bit Gauge.

GV> does this include all routes that are not the BGP best path or only the
routes that are not used for forwarding? What makes a route a "backup" route.
[MS] Explained my thought above.

195        *  Type = 27: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
196           marked as stale by Graceful Restart (GR) events.  The value is
197           structured as: 2-byte AFI, 1-byte SAFI, followed by a 64-bit
198           Gauge.  'Stale' refers to a path which has been declared stale by
199           the BGP GR mechanism as described in Section 4.1 of [RFC4724].

GV> GR events happen when a CPM moves from a primary unit to a standby
unit/process. Such involves significant processing. Hence i wonder how mush
operational value this brings, or if would make the GR event worse then it
already is.
[MS] This is just a stats sent to collector. BMP stats is not interfering to 
the GR processing. These counter are created by local BGP process after 
processing. So I am not clear about this comment.

201        *  Type = 28: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
202           marked as stale by Long-Lived Graceful Restart (LLGR).  The value
203           is structured as: 2-byte AFI, 1-byte SAFI, followed by a 64-bit
204           Gauge.  'Stale' refers to a path which has been declared stale by
205           the BGP LLGR mechanism as described in Section 4.3 of [RFC9494].

GV> see prior comments

211        *  Type = 30: (64-bit Gauge) Current Number of routes per-AFI/SAFI
212           left until reaching the received route threshold which corresponds
213           to the upper bound of accepted routes per Section 6.7 of
214           [RFC4271].  The value is structured as: 2-byte AFI, 1-byte SAFI,
215           followed by a 64-bit Gauge.

GV> Is this accurate? multiprotocol extensions are described in RFC4760 and not
in RFC4271. It is unclear how this counter referencing rfc4271 is to be applied
to rfc4760 when multiple afi/safi may be received from a single peer.

217        *  Type = 31: (64-bit Gauge) Current Number of routes left until
218           reaching a license-customized route threshold.  This value is
219           affected by whether a customized license exists, and when the
220           customized license is installed.

GV> This may be a soft threshold and in addition may be enforced outside the
router knowledge.

222        *  Type = 32: (64-bit Gauge) Current Number of routes in per-AFI/SAFI
223           left until reaching a license-customized route threshold.  This
224           value is affected by whether a customized license exists for the
225           relevant address family, and when the customized license is
226           installed.  The value is structured as: 2-byte AFI, 1-byte SAFI,
227           followed by a 64-bit Gauge.

GV> This may be a soft threshold and in addition may be enforced outside the
router knowledge.

264        *  Type = 39: (64-bit Gauge) Current number of routes refused to be
265           sent by exceeding the maximum AS_PATH length supported by the
266           local configuration.

GV> can this be more accurate described? Is it "refused to be sent" or simply
"not sent" because route AS_PATH is longer as max AS_PATH length towards the
peer?

268        *  Type = 40: (64-bit Gauge) Current number of routes in per-AFI/SAFI
269           refused to be sent by exceeding the maximum AS_PATH length
270           supported by the local configuration.  The value is structured as:
271           2-byte AFI, 1-byte SAFI, followed by a 64-bit Gauge.

GV> See prior comment. I do not think 'refused' is the most accurate word to
use.... maybe filtered is a better term to use?
[MS] We can update that.

328     5.  Operational Considerations

GV> Some of the definitions earlier have operational concerns included and are
maybe better added to the operational implication section.

330        This document defines new gauges for BMP statistics messages.  The

GV> i think more accurate would be that the document specifies gauges for "new
BMP statistics".

333        implementation-dependent.  Implementations SHOULD determine
334        appropriate report generation and delivery strategies, including
335        configurable timing intervals and threshold values.  The mechanism
336        for controlling the reporting of new gauges SHOULD be consistent with
337        that of existing types.  Implementations SHOULD also support per-
338        router configuration of statistic subsets for collection and
339        reporting.

GV> Why is this uppercase SHOULD? Is there a procedure that breaks? lowercase
seems sufficient as its documenting good behavior.
[MS] We can update that.

341        Some statistics are dependent on feature configurations, such as GR,
342        LLGR, and RPKI, so the corresponding statistics are only sent when
343        these features are enabled.  This statistics include Type 24, 25, 26,

GV> From operational perspective sending BGP Stats during a GR may impact the
GR event due to additional processing and dynamics. That is an operational
concern.

351        Certain statistics may have logical relationships (e.g., per-AFI/SAFI
352        counts summing to global totals).  Implementations MAY perform
353        consistency checks but MUST NOT assume strict dependencies (due to
354        potential race conditions or partial failures).  Discrepancies (e.g.,
355        sum(per-AFI/SAFI) != global count) SHOULD be logged as warnings but
356        MUST NOT disrupt protocol operation.

GV> not convinced these need to be BCP14 language. is BCP14 language required?

358        For backward compatibility, and absent policy otherwise, it is
359        RECOMMENDED that monitored routers capable of generating both (Type 7
360        and Type 18) or (Type 9 and Type 19) BMP statistics SHOULD transmit
361        both corresponding types simultaneously.  This allows monitoring
362        stations to process either format according to their needs without
363        disrupting existing implementations that rely on Type 7 or Type 9.

GV> In what way are The new types different from the prior types. its the exact
same value representing the exact same property.
[MS] This is an attempt to make counter explicit.

369        Counters may reset due to session restart, manual clearance, or
370        overflow.  Implementations MUST track discontinuities and log this
371        information.

GV> This document specifies gauges, not counters. Is this accurate usage of
words? is BCP14 language correct? seems not to be about formal protocol
procedure

373        Operators MAY consider rate-limiting statistic updates to minimize
374        performance impact on control-plane processes.  Operators SHOULD
375        enable only necessary statistics to reduce memory and CPU overhead.

GV> lowercase should/may seems sufficient
[MS] We can update that.

Many thanks for this document,

Kind Regards,
Gunter Van de Velde
RTG Area Director



_______________________________________________
GROW mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to