The GitHub Actions job "License Binary Checker" on texera.git/release/v1.2 has failed. Run started by GitHub user xuang7 (triggered by xuang7).
Head commit for run: b876b475d93c37696b8da0436d7ad3e7263e6317 / Suyash Jain <[email protected]> fix(workflow-operator): no null padding in reservoir sampling (#5606) ### What changes were proposed in this PR? `ReservoirSamplingOpExec` allocates a fixed-size reservoir of length `count` (the per-worker share of `k`). When a worker receives fewer tuples than `count`, only the first `n` slots are filled, but `onFinish` returned the whole array, yielding `count - n` trailing `null` entries. The nulls are currently swallowed by a distant null-guard in `DataProcessor`, so the bug is latent — but the operator violates the "do not emit null tuples" contract and breaks if that guard is ever narrowed or bypassed. ``` Before: input < k -> onFinish emits [t0 .. tn-1, null, ..., null] (engine guard hides them) After: input < k -> onFinish emits [t0 .. tn-1] (no nulls emitted at all) ``` The fix emits only the filled prefix: ```scala override def onFinish(port: Int): Iterator[TupleLike] = reservoir.iterator.take(n) ``` `take(n)` is a no-op when `n >= count` (input ≥ k), so the sampled output is unchanged in the normal case. ### Any related issues, documentation, discussions? Closes #5592 ### How was this PR tested? Added three regression cases to `ReservoirSamplingOpExecSpec`: | Case | Asserts | | --- | --- | | `input size < k` | only the received tuples are emitted, in order, no nulls | | empty input | `onFinish` emits nothing | | skewed partitioning (`k=10`, 3 workers, worker 0 gets 2 tuples) | no null padding for an under-filled worker share | All three fail against the old `reservoir.iterator` and pass with `reservoir.iterator.take(n)`; the 9 pre-existing cases stay green (TDD red → green verified by stashing the source fix). ``` sbt "WorkflowOperator/testOnly org.apache.texera.amber.operator.reservoirsampling.ReservoirSamplingOpExecSpec" # Tests: succeeded 12, failed 0, canceled 0, ignored 0, pending 0 ``` `sbt WorkflowOperator/scalafixAll` and `sbt WorkflowOperator/scalafmtAll` produce no further diff. ### Was this PR authored or co-authored using generative AI tooling? Yes, partially. I (Suyash Jain) worked on this PR together with Claude Code as a pair-programming assistant. I reviewed the final diff, ran the spec locally, and verified the red → green behavior of the new regression tests myself before opening the PR. Generated-by: Claude Code (Claude Opus 4.7) (backported from commit d5f5e12fb6879f15dbcf0c9cf6aaae3b532784e6) Co-authored-by: Xuan Gu <[email protected]> Report URL: https://github.com/apache/texera/actions/runs/27444261750 With regards, GitHub Actions via GitBox
