[PR] [VL] Add per-batch input-encoding counter to VeloxHashShuffleWriter [gluten]

via GitHub Mon, 18 May 2026 06:41:29 -0700


luis4a0 opened a new pull request, #12107:
URL: https://github.com/apache/gluten/pull/12107


   ## What changes are proposed in this pull request?
   
   Companion to #12083 (`SCOPED_TIMER` fix). With the per-stage timers in
   `cpuWallTimingList_` now recording real numbers, the natural complement
   is a counter that tells the reader what shape the writer actually sees
   — i.e. for each `write(cb, ...)` call, how many of the input children
   arrive as FLAT / DICTIONARY / CONSTANT / LAZY / other encoding (the
   values of `facebook::velox::VectorEncoding::Simple`).
   
   A new `inputEncodingCounts_` member (`std::array<int64_t, 5>`) is
   maintained on `VeloxHashShuffleWriter`. The counts are taken *before*
   any `getFlattenedRowVector()` invocation in `write(cb, ...)`, so they
   reflect the encoding the writer actually receives — not the post-flatten
   shape (which is always FLAT by construction). Output is gated through
   the existing `VELOX_SHUFFLE_WRITER_LOG_FLAG` log channel and uses the
   same line shape as the existing `cpuWallTimingList_` lines, so behavior
   is unchanged when the flag is off (default).
   
   This information is useful for at least three things:
   
   - Verifying whether a workload would benefit from writer-side
     dictionary / constant pass-through optimizations *before* writing
     the pass-through code (does the writer ever actually see those
     encodings?). Recent OSS PR #9727 (writer-side dictionary rebuild)
     and several proposed constant pass-through patches all share the
     same assumption that the writer sees DICT/CONST inputs — this
     counter makes that assumption directly checkable on any workload.
   - Spotting unexpected upstream operator changes that collapse /
     preserve encodings differently across Velox versions (e.g., a
     change in HashAggregate that starts emitting CONSTANT children
     when it didn't before).
   - Correlating a wall-time spike in a particular stage with the
     encoding mix that drove it.
   
   The counter is always-on (one branch + one increment per input
   child, swamped by the surrounding shuffle work). Scope is
   deliberately narrow: only the 5-bucket encoding mix at the
   hash-shuffle writer entry, surfaced via the existing log channel. A
   per-type breakdown (BIGINT / DECIMAL / VARCHAR / ...) and exposure
   through `ShuffleWriterMetrics` to the JVM SparkListener are both
   plausible follow-ups but kept out of this PR.
   
   Diff: +277 / -0 over 4 files (3 in `cpp/velox/`, 1 new test).
   
   ## How was this patch tested?
   
   This PR adds `cpp/velox/tests/VeloxHashShuffleWriterInputEncodingTest.cc`
   (second commit) with three gtest cases:
   
   - `allFlat` — two batches with 3 flat children each; asserts
     `FLAT` bucket accumulates to 6 and the other 4 buckets stay 0.
   - `mixedFlatDictConst` — one batch with FLAT + FLAT + DICTIONARY +
     CONSTANT children; asserts FLAT=2, DICTIONARY=1, CONSTANT=1, and
     the remaining 2 buckets stay 0.
   - `lazy` — wraps a child in a `LazyVector`; asserts the `LAZY`
     bucket increments (the encoding is captured BEFORE any
     `loadedVector()` call inside the writer).
   
   The new test is a standalone `add_velox_test(...)` entry rather than
   a new `TEST_P` case in `VeloxShuffleWriterTest.cc`, so it isolates
   the counter behavior from the parameterised round-trip matrix —
   which keeps the failure mode obvious if a future change quietly
   breaks one of the buckets.
   
   End-to-end runtime validation in a real Spark/Gluten run was not
   possible in my development environment for the same JNI-symbol-resolution
   reason documented in #12083 — the production gluten-velox JAR takes
   precedence on the JNI path, shadowing the locally-built OSS bundle.
   The CI-built bundle will not have that issue. Likewise, the local
   Velox ep build here is configured with `VELOX_BUILD_TEST_UTILS=OFF`,
   so the new gtest binary was syntax-validated locally (via
   `g++ -fsyntax-only` against the in-tree compile flags) but not
   executed; Gluten CI will run it.
   
   Marked as draft for one more review pass before flipping to ready.
   
   ## Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: GitHub Copilot CLI (Claude Opus 4.7 1M context)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [VL] Add per-batch input-encoding counter to VeloxHashShuffleWriter [gluten]

Reply via email to