moonchen opened a new pull request, #13219:
URL: https://github.com/apache/trafficserver/pull/13219

   ## Summary
   
   The `proxy.config.memory.max_usage` connection throttle and the
   `proxy.process.traffic_server.memory.rss` gauge are both driven by the
   `MemoryLimit` continuation in `src/traffic_server/traffic_server.cc`, and 
both
   were broken. This fixes all three defects.
   
   ### 1. The throttle was silently dead (~7 years)
   
   `MemoryLimit` sets the global `net_memory_throttle` flag when RSS exceeds
   `max_usage`, but the only reader was in the accept path and it was **removed 
in
   2018** (`4bfaf3645`, "Make throttling feature more useful."). Since then the
   flag has been written and never consulted — `check_net_throttle()` only 
honored
   the connection-*count* throttle. So configuring 
`proxy.config.memory.max_usage`
   had no effect on connection acceptance at all.
   
   Restored by honoring `net_memory_throttle` in `check_net_throttle(ACCEPT)` 
(one
   spot, so every accept site picks it up). Gated on `ACCEPT` only: outbound
   `CONNECT`s keep flowing so in-flight transactions can complete and release
   memory.
   
   ### 2. The metric was frozen on default configs (freeze bug)
   
   `MemoryLimit` stopped updating the gauge whenever `max_usage <= 0` (the
   default): the periodic handler did `e->cancel(); delete this;`, so the gauge 
was
   written exactly once early in startup and then **never updated again**. In
   production it was observed frozen at ~1.4 GiB while real RSS was ~375 GiB.
   
   Now the current RSS is published every period, decoupled from the throttle
   feature, so the metric stays fresh regardless of whether the throttle is
   enabled.
   
   ### 3. Wrong source / units / semantics (peak vs current, platform bug)
   
   It stored `getrusage(RUSAGE_SELF).ru_maxrss << 10`:
   
   - `ru_maxrss` is **peak** RSS (never decreases) — wrong for a gauge named
     `.rss`, and it made the throttle **latch on** once peak crossed the limit 
and
     never release.
   - `ru_maxrss` is **KiB on Linux** (so `<<10` is correct there) but **bytes on
     macOS/BSD**, so the shift was wrong off-Linux.
   
   Reads current RSS portably via a new `ink_get_current_rss()` helper in
   `tscore/ink_sys_control` — Linux `/proc/self/statm` (`resident pages *
   sysconf(_SC_PAGESIZE)`), macOS `task_info(... MACH_TASK_BASIC_INFO ...)`. The
   throttle now compares current (not peak) RSS in bytes against `max_usage` 
(which
   is documented in bytes), so it both engages and **releases** correctly.
   
   ## Why the two fixes are coupled
   
   Reviving the throttle requires the current-RSS fix: with peak RSS the flag 
would
   latch on forever and the server would refuse all new connections permanently.
   Current RSS is what lets the throttle clear when memory actually drops.
   
   ## Testing
   
   - New unit test `src/tscore/unit_tests/test_ink_sys_control.cc`: asserts
     `ink_get_current_rss()` is plausible and **grows** after touching a 64 MiB
     allocation.
   - Builds clean (`traffic_server`, `inknet`, `test_tscore`); pre-commit format
     hooks (clang-format / cmake-format / whitespace) pass.
   - Docs updated for the metric (current vs peak, refreshed every 10s).
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to