Akanksha-kedia commented on PR #18892: URL: https://github.com/apache/pinot/pull/18892#issuecomment-4855078364
## Code Review: `fix/split-part-memory-optimization` The core idea is sound — eliminating the per-row `String[]` allocation in the 4-arg `splitPart` overload is a legitimate hot-path optimization, and the implementation faithfully replicates `splitByWholeSeparator` semantics. The deterministic test table is thorough. Two **MAJOR** gaps and several **MINOR** issues below. --- ### C4 — Performance **[MAJOR] No benchmark covers the 4-arg overload (the thing this PR actually changes)** `BenchmarkSplitPart` only benchmarks `splitPart(input, delimiter, index)` (3-arg). The PR's optimization target is `splitPart(input, delimiter, limit, index)` (4-arg), and there is no way to verify or quantify the improvement without a benchmark. > Add `testSplitPartLimitedNew()` / `testSplitPartLimitedOld()` methods to `BenchmarkSplitPart`, mirroring the existing 3-arg pair. **[MINOR] No single-char delimiter fast path for negative indices in the 4-arg path** The 3-arg overload has `splitPartNegativeIdxSingleCharDelim` which avoids a second forward scan for single-char delimiters (lines 609–612). The 4-arg overload always executes two full scans for negative indices regardless of delimiter length. Single-char delimiters (`,`, `.`, `/`) dominate real workloads. > Add a `// TODO: add single-char fast path for negative indices, analogous to splitPartNegativeIdxSingleCharDelim` comment. --- ### C5 — Correctness & Nulls **[MINOR] `while (pos <= inputLen)` loop bound at `splitPartLimitedForward` — the `=` arm is dead code** Every code path that sets `pos >= inputLen` returns before the outer loop re-evaluates. The loop is functionally `while (pos < inputLen)`. > Change to `while (pos < inputLen)` for clarity. **[MINOR] `if (totalFields == 0)` guard at line 752 is unreachable** `countFieldsLimited` returns at minimum `1` in all cases. A `totalFields` of `0` is structurally impossible. > Remove the dead guard or replace with `assert totalFields > 0`. **[MINOR] Stale comment at `splitPartLimitedForward` lines 839–841** `// (this case is handled by the trailing-delimiter logic below)` — the trailing-delimiter logic is on a completely separate code path, not "below" this return. > Replace with: `// The requested index is beyond the last field.` --- ### C6 — Testing **[MAJOR] `testSplitPartRandomized` only fuzz-tests the 3-arg overload — the 4-arg overload has zero randomized coverage** ```java // line 316: String actual = StringFunctions.splitPart(input, delimiter, index); // ← 3-arg only ``` > Add `testSplitPartLimitedRandomized` cross-checking `splitPart(input, delimiter, limit, index)` against `splitPartArrayBased(StringUtils.splitByWholeSeparator(input, delimiter, limit), index)` over 10k random tuples. **[MINOR] No SQL-level test coverage for `splitPart` with a limit** `pinot-query-runtime/src/test/resources/queries/StringFunctions.json` has no `splitPart` entries. SQL-level tests catch function-registry wiring issues that unit tests miss. --- ### C3 — Naming & API **[MINOR] Missing `// compare with BenchmarkSplitPart` cross-reference in the 4-arg overload** The 3-arg `splitPart` (line 588) has this comment. The 4-arg overload — now the more complex and performance-sensitive — has no such pointer. --- ### C8 — Process & Scope **[MINOR] Commit message overstates fuzz coverage** > *"All existing unit tests (172 cases including randomized fuzz) pass unchanged, confirming behavioral equivalence."* The fuzz test only exercises the 3-arg path. The 4-arg path is covered by 89 deterministic table-driven cases only. Scope is clean — single file, no creep. PR description accurately describes the change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
