aglinxinyuan opened a new pull request, #5656:
URL: https://github.com/apache/texera/pull/5656

   ### What changes were proposed in this PR?
   
   Pin behavior of four previously-uncovered modules in the `FilterOpExec` 
inheritance hierarchy in `common/workflow-operator`. No production-code changes.
   
   | Spec | Source class | Tests |
   | --- | --- | --- |
   | `FilterOpExecSpec` | `FilterOpExec` (abstract base) | 9 |
   | `RegexOpExecSpec` | `RegexOpExec` | 8 |
   | `SubstringSearchOpExecSpec` | `SubstringSearchOpExec` | 10 |
   | `RandomKSamplingOpExecSpec` | `RandomKSamplingOpExec` | 7 |
   
   All four spec files follow the `<srcClassName>Spec.scala` one-to-one 
convention. `SpecializedFilterOpExec` already has its own spec; this PR covers 
the rest of the family.
   
   **Behavior pinned — `FilterOpExec`**
   
   | Surface | Contract |
   | --- | --- |
   | `processTuple` (matching predicate) | yields the input tuple as a 
single-element iterator |
   | `processTuple` (non-matching predicate) | yields an empty iterator |
   | `processTuple` | passes the actual tuple instance to the predicate; 
ignores the `port` argument |
   | `setFilterFunc` | swapping the predicate changes the next `processTuple` 
result; value-aware predicates branch per-tuple |
   | Type contract | `FilterOpExec` is a `Serializable OperatorExecutor` |
   
   **Behavior pinned — `RegexOpExec`**
   
   | Surface | Contract |
   | --- | --- |
   | matching regex | yields the tuple |
   | find-semantics | unanchored substring match (not full-string `matches`) |
   | `caseInsensitive = true` / `false` | matches case-(in)sensitively |
   | invalid regex string | construction succeeds (lazy `Pattern`); 
`PatternSyntaxException` surfaces on first `processTuple` |
   | repeated invocations | pattern stays cached; results are stable |
   | malformed descriptor JSON | construction throws `JsonProcessingException` |
   
   **Behavior pinned — `SubstringSearchOpExec`**
   
   | Surface | Contract |
   | --- | --- |
   | substring present / absent | yields tuple / nothing |
   | position in value (start / middle / end) | irrelevant — `String.contains` 
semantics |
   | `isCaseSensitive = true` / `false` | case-(in)sensitive (lowercased 
equality on both sides) |
   | empty substring | matches every value, including the empty string |
   | repeated invocations | results stable |
   | malformed descriptor JSON | construction throws `JsonProcessingException` |
   
   **Behavior pinned — `RandomKSamplingOpExec`**
   
   | Surface | Contract |
   | --- | --- |
   | `percentage = 100` | accepts every tuple (1000-sample run) |
   | `percentage = 0` | rejects every tuple (1000-sample run) |
   | Same `workerCount` + `percentage` | identical emission count across two 
fresh instances (deterministic seed) |
   | `percentage = 50` | approximately half pass (within ±150 of 1000 over 2000 
draws) |
   | Different `workerCount` | divergent emission sequences (the seed is 
`workerCount`) |
   | malformed descriptor JSON | construction throws `JsonProcessingException` |
   
   `FilterOpExec` is abstract, so the spec uses a minimal test-only concrete 
subclass that exposes `setFilterFunc` for behavior-only assertions. The three 
subclass specs build descriptor JSON via `objectMapper.writeValueAsString` of a 
fresh `*OpDesc` (same fixture pattern as the existing 
`SpecializedFilterOpExecSpec`).
   
   ### Any related issues, documentation, discussions?
   
   Closes #5652.
   
   ### How was this PR tested?
   
   Pure unit-test additions; verified locally with:
   
   - `sbt "WorkflowOperator/testOnly 
org.apache.texera.amber.operator.filter.FilterOpExecSpec 
org.apache.texera.amber.operator.regex.RegexOpExecSpec 
org.apache.texera.amber.operator.substringSearch.SubstringSearchOpExecSpec 
org.apache.texera.amber.operator.randomksampling.RandomKSamplingOpExecSpec"` — 
34 tests, all green
   - `sbt scalafmtCheckAll` — clean
   - CI to confirm
   
   ### Was this PR authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Opus 4.7 [1M context])


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to