The GitHub Actions job "Required Checks" on texera.git/main has succeeded. Run started by GitHub user github-merge-queue[bot] (triggered by github-merge-queue[bot]).
Head commit for run: 7ae9b35f12748616daf7bcc925fdde2e5def5187 / Xinyuan Lin <[email protected]> test(workflow-operator): add unit test coverage for filter-family operator executors (#5656) ### What changes were proposed in this PR? Pin behavior of four previously-uncovered modules in the `FilterOpExec` inheritance hierarchy in `common/workflow-operator`. No production-code changes. | Spec | Source class | Tests | | --- | --- | --- | | `FilterOpExecSpec` | `FilterOpExec` (abstract base) | 9 | | `RegexOpExecSpec` | `RegexOpExec` | 8 | | `SubstringSearchOpExecSpec` | `SubstringSearchOpExec` | 10 | | `RandomKSamplingOpExecSpec` | `RandomKSamplingOpExec` | 7 | All four spec files follow the `<srcClassName>Spec.scala` one-to-one convention. `SpecializedFilterOpExec` already has its own spec; this PR covers the rest of the family. **Behavior pinned — `FilterOpExec`** | Surface | Contract | | --- | --- | | `processTuple` (matching predicate) | yields the input tuple as a single-element iterator | | `processTuple` (non-matching predicate) | yields an empty iterator | | `processTuple` | passes the actual tuple instance to the predicate; ignores the `port` argument | | `setFilterFunc` | swapping the predicate changes the next `processTuple` result; value-aware predicates branch per-tuple | | Type contract | `FilterOpExec` is a `Serializable OperatorExecutor` | **Behavior pinned — `RegexOpExec`** | Surface | Contract | | --- | --- | | matching regex | yields the tuple | | find-semantics | unanchored substring match (not full-string `matches`) | | `caseInsensitive = true` / `false` | matches case-(in)sensitively | | invalid regex string | construction succeeds (lazy `Pattern`); `PatternSyntaxException` surfaces on first `processTuple` | | repeated invocations | pattern stays cached; results are stable | | malformed descriptor JSON | construction throws `JsonProcessingException` | **Behavior pinned — `SubstringSearchOpExec`** | Surface | Contract | | --- | --- | | substring present / absent | yields tuple / nothing | | position in value (start / middle / end) | irrelevant — `String.contains` semantics | | `isCaseSensitive = true` / `false` | case-(in)sensitive (lowercased equality on both sides) | | empty substring | matches every value, including the empty string | | repeated invocations | results stable | | malformed descriptor JSON | construction throws `JsonProcessingException` | **Behavior pinned — `RandomKSamplingOpExec`** | Surface | Contract | | --- | --- | | `percentage = 100` | accepts every tuple (1000-sample run) | | `percentage = 0` | rejects every tuple (1000-sample run) | | Same `workerCount` + `percentage` | identical emission count across two fresh instances (deterministic seed) | | `percentage = 50` | approximately half pass (within ±150 of 1000 over 2000 draws) | | Different `workerCount` | divergent emission sequences (the seed is `workerCount`) | | malformed descriptor JSON | construction throws `JsonProcessingException` | `FilterOpExec` is abstract, so the spec uses a minimal test-only concrete subclass that exposes `setFilterFunc` for behavior-only assertions. The three subclass specs build descriptor JSON via `objectMapper.writeValueAsString` of a fresh `*OpDesc` (same fixture pattern as the existing `SpecializedFilterOpExecSpec`). ### Any related issues, documentation, discussions? Closes #5652. ### How was this PR tested? Pure unit-test additions; verified locally with: - `sbt "WorkflowOperator/testOnly org.apache.texera.amber.operator.filter.FilterOpExecSpec org.apache.texera.amber.operator.regex.RegexOpExecSpec org.apache.texera.amber.operator.substringSearch.SubstringSearchOpExecSpec org.apache.texera.amber.operator.randomksampling.RandomKSamplingOpExecSpec"` — 34 tests, all green - `sbt scalafmtCheckAll` — clean - CI to confirm ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7 [1M context]) Report URL: https://github.com/apache/texera/actions/runs/27451206440 With regards, GitHub Actions via GitBox
