wu-sheng opened a new pull request, #13865:
URL: https://github.com/apache/skywalking/pull/13865
### Fix two related issues in the live debugger surface for unreleased
SWIP-13 code
**Bug 1 — terminal MAL `output` sample shows the wrong shape in the UI.**
Operators inspecting a `histogram_percentile([...])` rule see bucket-keyed
samples (`(service_name=…, le=10)`, `…, le=20`, …) in the value column
instead
of the percentile result they expect to see in storage (`{p=50}`, `{p=90}`,
`{p=99}`).
Two reasons: (1) the captured `output` sample never carried the metric's
actual
reading on `payload.value` — the wire only had `metric` / `entity` /
`valueType`
/ `timeBucket`, so the UI fell back to rendering the upstream
`histogram_percentile` function-stage `SampleFamily`, which legitimately has
`le=` keyed samples but isn't what the rule emits. (2) Even if we add
`payload.value`,
two-phase functions (`AvgHistogramPercentileFunction`,
`SumHistogramPercentileFunction`)
populate their user-visible field (`percentileValues`) lazily in
`calculate()`,
which the L1 streaming pipeline calls *after* the `captureMeterEmit` probe
site,
so a naive `getValue()` read returns the empty pre-calculate map.
This PR fixes both:
- New `appendValue(JsonObject, AcceptableValue<?>)` in `MALDebugRecorderImpl`
renders `payload.value` per holder type — JSON number for `LongValueHolder`
/ `IntValueHolder` / `DoubleValueHolder` (with `\"NaN\"` / `\"Infinity\"`
string fall-back for non-finite doubles so the wire stays valid JSON), and
`JsonObject {key: long}` for `LabeledValueHolder` (label combos for
`*Labeled` functions, `p=<rank>` entries for percentile functions).
- Forces `Metrics.calculate()` once before the `getValue()` read. Safe
because
the function's `isCalculated` guard makes the streaming pipeline's later
call a no-op, and cross-node `combine()` resets `isCalculated=false` so
post-merge state still recomputes on read.
Both edits are inside `MALDebugRecorderImpl`, which is only reached from the
gated `captureMeterEmit` probe — when no operator has installed a session the
probe gate is JIT-elided on hot paths, so steady-state runtime cost is
**zero**.
**Bug 2 — `recordCap` ceiling is too generous for an operator-driven
debugger.**
Hard cap was `10000` records / session, default `1000`. Both burn UI page
real estate and heap (per-record JSON ≈ 10 KiB → ~100 MiB worst case per
session). Operators inspect a handful of executions, not a paginated
firehose.
This PR drops both to `100` (default = hard cap = `MAX_RECORD_CAP`),
yielding ~1 MiB worst-case heap per session. Out-of-range requests still get
`400 invalid_limits` from the constructor.
- [x] Add a unit test to verify that the fix works.
- All 16 existing `dsl-debugging` unit tests pass against the new code; no
fixture broke. The cap-aware test
(`DebugSessionRegistryTest.reapExpired_…`)
was switched to `SessionLimits.MAX_RECORD_CAP` so it tracks the cap going
forward instead of pinning a stale literal.
- The MAL e2e flow now asserts `payload.value` is present and numeric on
every terminal `output` sample.
- Live-verified against a local stack feeding a `histogram_percentile`
rule: terminal sample now reports
`value: {\"{p=50}\": 250000, \"{p=90}\": 5000000, \"{p=99}\": 10000000}`.
- [x] Explain briefly why the bug exists and how to fix it.
- See above. Both bugs are debug-only paths; fix is debug-only and gated.
- [ ] If this pull request closes/resolves/fixes an existing issue, replace
the issue number. Closes #<issue number>.
- [x] Update the [`CHANGES`
log](https://github.com/apache/skywalking/blob/master/docs/en/changes/changes.md).
- Not required: this patches unreleased SWIP-13 code that's already in the
10.5.0 changelog entry. The wire/cap changes won't be visible to any
released version.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]