aglinxinyuan opened a new issue, #5652:
URL: https://github.com/apache/texera/issues/5652
## Background
Four modules in `common/workflow-operator` form a `FilterOpExec` inheritance
hierarchy that lacks dedicated unit-spec coverage. The base `FilterOpExec` is
abstract; the concrete subclasses parse a JSON descriptor at construction and
call `setFilterFunc` with their per-class predicate.
| Source class | Package | Purpose |
| --- | --- | --- |
| `FilterOpExec` | `operator.filter` | Abstract base — pluggable
`filterFunc: Tuple => Boolean`; `processTuple` yields the tuple iff
`filterFunc(tuple)` is true |
| `RegexOpExec` | `operator.regex` | Compiles a `Pattern` from the
descriptor; emits tuples whose attribute matches the pattern
(`find`-semantics); honors `caseInsensitive` |
| `SubstringSearchOpExec` | `operator.substringSearch` | Emits tuples whose
attribute contains the descriptor's substring; honors `isCaseSensitive` |
| `RandomKSamplingOpExec` | `operator.randomksampling` | Emits each tuple
with probability `desc.percentage / 100.0`; seed = `workerCount` (deterministic
for the same worker count) |
`SpecializedFilterOpExec` already has its own spec; this PR covers the rest
of the family.
## Behavior to pin
| Surface | Contract |
| --- | --- |
| `FilterOpExec.processTuple` (matching predicate) | yields the single tuple
|
| `FilterOpExec.processTuple` (non-matching predicate) | yields an empty
`Iterator` |
| `FilterOpExec.setFilterFunc` | swapping the predicate changes the next
`processTuple` result |
| `RegexOpExec` (pattern matches) | yields the tuple via
`Pattern.matcher.find` |
| `RegexOpExec` (pattern does not match) | yields nothing |
| `RegexOpExec` with `caseInsensitive = true` | matches case-insensitively |
| `RegexOpExec` with `caseInsensitive = false` | matches case-sensitively |
| `RegexOpExec` constructor with invalid descriptor JSON | propagates a
Jackson parse exception |
| `SubstringSearchOpExec` with `isCaseSensitive = true` | matches
case-sensitively |
| `SubstringSearchOpExec` with `isCaseSensitive = false` | matches by
lowercased equality |
| `SubstringSearchOpExec` (empty substring) | matches every tuple (because
`"" `is in any string) |
| `RandomKSamplingOpExec` with `percentage = 100` | accepts every tuple |
| `RandomKSamplingOpExec` with `percentage = 0` | rejects every tuple |
| `RandomKSamplingOpExec` (intermediate percentage, deterministic seed) |
produces deterministic emission count over a large sample |
## Scope
- New spec files (one per source class):
- `FilterOpExecSpec.scala`
- `RegexOpExecSpec.scala`
- `SubstringSearchOpExecSpec.scala`
- `RandomKSamplingOpExecSpec.scala`
- No production-code changes.
- `FilterOpExec` is exercised via a test-only concrete subclass (it is
abstract).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]