moonchen opened a new pull request, #13343:
URL: https://github.com/apache/trafficserver/pull/13343

   ## Summary
   
   `RateLimiter::_metrics` (in `plugins/experimental/rate_limit/limiter.h`) is 
value-initialized to `0` and is only populated by `initializeMetrics()`, which 
`parseYaml()` calls **only when a selector carries a `metrics:` block**. But 
`incrementMetric()` skips the stat update only when the entry equals `TS_ERROR` 
(`-1`):
   
   ```cpp
   std::array<int, RATE_LIMITER_METRIC_MAX> _metrics{};   // value-initialized 
to 0
   
   void incrementMetric(uint metric) {
     if (_metrics[metric] != TS_ERROR) {                  // 0 != -1  → true → 
does NOT skip
       TSStatIntIncrement(_metrics[metric], 1);           // 
TSStatIntIncrement(0, 1)
     }
   }
   ```
   
   So for any rate-limit selector configured **without** a `metrics:` block, 
the guard never skips, and every `incrementMetric()` call on the queue / reject 
/ expire / resume paths runs `TSStatIntIncrement(0, 1)`.
   
   These call sites are unconditional (not gated on metrics being configured):
   - `sni_limiter.cc` — `RATE_LIMITER_METRIC_QUEUED`, 
`RATE_LIMITER_METRIC_REJECTED`
   - `sni_selector.cc` — `RATE_LIMITER_METRIC_RESUMED`, 
`RATE_LIMITER_METRIC_EXPIRED`
   - `txn_limiter.cc` — `QUEUED`, `REJECTED`, `RESUMED`, `EXPIRED`
   
   `TSStatIntIncrement(0)` does not assert, because `ts::Metrics::Storage`'s 
constructor reserves id `0` as `proxy.process.api.metrics.bad_id` ("Reserve 
slot 0 for errors, this should always be 0"), so `valid(0)` is always true. The 
effect is therefore **silent**: that reserved id-0 metric is incremented on 
every queue/reject/expire/resume for a selector with no `metrics:` block — an 
unrelated global counter slowly drifts upward.
   
   ## Reproduction
   
   A selector with a limit/queue but no `metrics:` block, driven past its limit:
   
   ```yaml
   selector:
     - name: example.com
       limit: 1
       queue: { size: 10, max_age: 1 }
   ```
   
   Send more than `limit` concurrent handshakes to `example.com`; each 
queued/rejected (and later expired/resumed) connection bumps 
`proxy.process.api.metrics.bad_id`.
   
   ## Fix
   
   Fill `_metrics` with `TS_ERROR` in the constructor — the same "not 
registered" marker `metric_helper()` already writes per element — so 
`incrementMetric()` is a true no-op until metrics are actually registered. One 
line, robust to the metric count.
   
   ## Testing
   
   - The equivalent fix (initialize the IDs to `TS_ERROR`) was validated under 
AddressSanitizer on a 10.x branch: the same selector-without-`metrics:` path 
aborts under ASAN before the change and passes cleanly after.
   - This PR's change has not been built locally against `master`; relying on 
CI for the compile.
   
   ---
   
   _Unrelated observation while investigating (not addressed here): on the 
configured-metrics path, `parseYaml()` calls 
`initializeMetrics(RATE_LIMITER_TYPE_SNI, prefix, tag)` but the signature is 
`initializeMetrics(uint type, std::string tag, std::string prefix = ...)`, so 
`prefix` and `tag` appear to be passed swapped — affecting metric **names** 
when a `metrics:` block is present. Happy to file separately if it's a real 
issue._
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to