The GitHub Actions job "Benchmarks PR Comment" on texera.git/main has failed.
Run started by GitHub user PG1204 (triggered by PG1204).

Head commit for run:
7ae9b35f12748616daf7bcc925fdde2e5def5187 / Xinyuan Lin <[email protected]>
test(workflow-operator): add unit test coverage for filter-family operator 
executors (#5656)

### What changes were proposed in this PR?

Pin behavior of four previously-uncovered modules in the `FilterOpExec`
inheritance hierarchy in `common/workflow-operator`. No production-code
changes.

| Spec | Source class | Tests |
| --- | --- | --- |
| `FilterOpExecSpec` | `FilterOpExec` (abstract base) | 9 |
| `RegexOpExecSpec` | `RegexOpExec` | 8 |
| `SubstringSearchOpExecSpec` | `SubstringSearchOpExec` | 10 |
| `RandomKSamplingOpExecSpec` | `RandomKSamplingOpExec` | 7 |

All four spec files follow the `<srcClassName>Spec.scala` one-to-one
convention. `SpecializedFilterOpExec` already has its own spec; this PR
covers the rest of the family.

**Behavior pinned — `FilterOpExec`**

| Surface | Contract |
| --- | --- |
| `processTuple` (matching predicate) | yields the input tuple as a
single-element iterator |
| `processTuple` (non-matching predicate) | yields an empty iterator |
| `processTuple` | passes the actual tuple instance to the predicate;
ignores the `port` argument |
| `setFilterFunc` | swapping the predicate changes the next
`processTuple` result; value-aware predicates branch per-tuple |
| Type contract | `FilterOpExec` is a `Serializable OperatorExecutor` |

**Behavior pinned — `RegexOpExec`**

| Surface | Contract |
| --- | --- |
| matching regex | yields the tuple |
| find-semantics | unanchored substring match (not full-string
`matches`) |
| `caseInsensitive = true` / `false` | matches case-(in)sensitively |
| invalid regex string | construction succeeds (lazy `Pattern`);
`PatternSyntaxException` surfaces on first `processTuple` |
| repeated invocations | pattern stays cached; results are stable |
| malformed descriptor JSON | construction throws
`JsonProcessingException` |

**Behavior pinned — `SubstringSearchOpExec`**

| Surface | Contract |
| --- | --- |
| substring present / absent | yields tuple / nothing |
| position in value (start / middle / end) | irrelevant —
`String.contains` semantics |
| `isCaseSensitive = true` / `false` | case-(in)sensitive (lowercased
equality on both sides) |
| empty substring | matches every value, including the empty string |
| repeated invocations | results stable |
| malformed descriptor JSON | construction throws
`JsonProcessingException` |

**Behavior pinned — `RandomKSamplingOpExec`**

| Surface | Contract |
| --- | --- |
| `percentage = 100` | accepts every tuple (1000-sample run) |
| `percentage = 0` | rejects every tuple (1000-sample run) |
| Same `workerCount` + `percentage` | identical emission count across
two fresh instances (deterministic seed) |
| `percentage = 50` | approximately half pass (within ±150 of 1000 over
2000 draws) |
| Different `workerCount` | divergent emission sequences (the seed is
`workerCount`) |
| malformed descriptor JSON | construction throws
`JsonProcessingException` |

`FilterOpExec` is abstract, so the spec uses a minimal test-only
concrete subclass that exposes `setFilterFunc` for behavior-only
assertions. The three subclass specs build descriptor JSON via
`objectMapper.writeValueAsString` of a fresh `*OpDesc` (same fixture
pattern as the existing `SpecializedFilterOpExecSpec`).

### Any related issues, documentation, discussions?

Closes #5652.

### How was this PR tested?

Pure unit-test additions; verified locally with:

- `sbt "WorkflowOperator/testOnly
org.apache.texera.amber.operator.filter.FilterOpExecSpec
org.apache.texera.amber.operator.regex.RegexOpExecSpec
org.apache.texera.amber.operator.substringSearch.SubstringSearchOpExecSpec
org.apache.texera.amber.operator.randomksampling.RandomKSamplingOpExecSpec"`
— 34 tests, all green
- `sbt scalafmtCheckAll` — clean
- CI to confirm

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7 [1M context])

Report URL: https://github.com/apache/texera/actions/runs/27452427793

With regards,
GitHub Actions via GitBox

Reply via email to