yaooqinn commented on PR #12211:
URL: https://github.com/apache/gluten/pull/12211#issuecomment-4598937845

   Thanks @jackylee-ch, V3 layout is a sensible extension of the cache-stats 
wire we landed in #12092 / #12196. Several things to discuss before this lands:
   
   **1. Benchmark needs to be re-run.** The checked-in `-results.txt` is `10K 
rows / 4 partitions / 1 iteration` on an Apple M5 Pro — `Stdev=0` across the 
board because there's only one sample. Differences in the 1-3 ms range (e.g. 
"1.1X" at all-16-cols read, where lazy mode *physically* cannot be faster than 
eager) are noise. Also `build 1.9X` is surprising because V3 does N 
`serializeSingleColumn` calls vs V2's single-pass `batchSerialize` — the 
ordering legacy > V2 > V3 doesn't match the physical work done; this needs 
reruns on a server / GHA-equivalent runner with iter≥3 and `100M rows / 32 
partitions` (matching the code defaults). Please also add a **cache memory 
footprint** column — V3 per-col framing + `getFlattenedRowVector()` flattening 
Dictionary/Constant encodings could regress cache size significantly for 
dict-encoded payloads, and that's currently unmeasured.
   
   **2. Do we really need a new SQLConf?** V3 functionally supersedes V2 (V3 
frames also carry `statsBlob`), so this isn't a new behavioral feature — it's a 
wire-format upgrade. Adding a dedicated `lazy.deserialization.enabled` boolean 
commits Gluten to maintaining three cache paths (legacy / V2-stats / 
V3-lazy-and-stats) and a three-level fallback chain. Once we trust V3, we'd 
want to deprecate V2-stats, which means another deprecation cycle. Could we 
either (a) skip the conf and gate V3 behind `partitionStats.enabled` once it's 
stable, or (b) turn `partitionStats.enabled` into a `string` conf with `off | 
v2 | v3` values? `Configuration.md` already warns "V3 is NOT backward 
compatible with V2 readers" + `default=false` — operationally nobody is going 
to flip this, so the conf risks being long-lived dead code.
   
   **3. Cross-language test parity vs #12196.** V3 has no cpp-side byte-equal 
golden test; JVM-side tests synthesize their own frames via `craftV3Framed`. We 
just established the cpp-golden ↔ JVM-parser round-trip pattern in #12196 
specifically because layout drift between halves is a correctness hazard. V3 
needs the same: a `framedSerializeWithStatsV3Golden` cpp test pinning a 
byte-stable literal + a JVM parser round-trip over that same literal.
   
   **4. Smaller items.**
   - All-null column case not covered (we hit the PrestoSerde uninit-values bug 
in #12092 development, same risk class for per-col path).
   - `getFlattenedRowVector()` side effect on Dictionary/Constant encoding not 
documented.
   - The `// JNI pin outlives` comment in `deserializeV3` describes a non-issue 
(copies are made synchronously in step 6, the lazy loader doesn't depend on the 
pin) — please trim.
   - Two near-identical magic checks (`parseFramedBytes` byte[3] dispatch vs 
`isV3Format` 4-byte compare) — please consolidate.
   - Consider folding `statsExtV3AvailableFlag` and `statsExtAvailableFlag` 
into a single capability enum (`Unknown | V2 | V3 | Unavailable`) — two 
independent one-shot latches double the operational diagnosis surface.
   
   Happy to file any of these as separate issues if it helps.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to