aglinxinyuan opened a new pull request, #5449:
URL: https://github.com/apache/texera/pull/5449
Closes #5448.
## Summary
Pin behavior of two previously-uncovered modules in
`engine/architecture/controller/execution`:
| Spec | Source class | Tests |
| --- | --- | --- |
| `OperatorExecutionSpec` | `OperatorExecution` | 14 (1 pending) |
| `RegionExecutionSpec` | `RegionExecution` | 14 |
Both spec files follow the `<srcClassName>Spec.scala` one-to-one convention.
## Behavior pinned
### `OperatorExecution`
| Surface | Contract |
| --- | --- |
| `initWorkerExecution(workerId)` | registers a fresh `WorkerExecution`
under `workerId` and returns it |
| `getWorkerExecution(workerId)` | returns the previously-initialized
`WorkerExecution` |
| `getWorkerIds` | empty for a fresh operator; otherwise the set of every
initialized worker id |
| `getState` (no workers) | `UNINITIALIZED` (empty-iterable fallthrough in
`ExecutionUtils.aggregateStates`) |
| `getState` (all COMPLETED) | `COMPLETED` |
| `getState` (any RUNNING) | `RUNNING` |
| `getState` (all UNINITIALIZED) | `UNINITIALIZED` |
| `getStats` | per-port input / output metrics summed across workers (count
+ size per `portId`); time fields are per-worker sums; distinct ports stay
separate |
| `getStats` (no workers) | empty input / output metrics; zero time totals |
| `isInputPortCompleted` / `isOutputPortCompleted` | `true` only when every
worker reports the requested port as completed; input and output ports with the
same `portId` are tracked independently |
### `RegionExecution`
| Surface | Contract |
| --- | --- |
| `initOperatorExecution(opId)` | registers a fresh `OperatorExecution` and
returns it |
| `initOperatorExecution(opId, Some(inherited))` | **deep-clones** the
inherited `OperatorExecution`; mutations on the original do not leak into the
clone |
| second `initOperatorExecution(opId)` for the same id | throws
`AssertionError` |
| `getOperatorExecution` / `hasOperatorExecution` /
`getAllOperatorExecutions` | retrieval semantics |
| `initLinkExecution(link)` | registers a fresh `LinkExecution`; second call
for the same link throws `AssertionError`; distinct links and their inner
channel-executions stay independent |
| `getStats` | one `OperatorMetrics` per registered `OperatorExecution`,
keyed by `opId`; empty when no operator is registered |
| `getState` / `isCompleted` | for a region with no ports, vacuously
`COMPLETED` |
## Notes
While writing `OperatorExecutionSpec` I discovered a
**docstring-vs-implementation mismatch** in
`OperatorExecution.initWorkerExecution`:
```scala
def initWorkerExecution(workerId: ActorVirtualIdentity): WorkerExecution = {
assert(
!workerExecutions.contains(workerId), // <-- checks VALUES, not KEYS
s"WorkerExecution already exists for workerId: $workerId"
)
workerExecutions.put(workerId, WorkerExecution())
...
}
```
`workerExecutions` is a `java.util.concurrent.ConcurrentHashMap` and
`.contains(Object)` is the legacy `Hashtable` method that checks **values**,
not **keys**. So the assertion never fires and the second call silently
overwrites the prior `WorkerExecution` — contradicting the docstring (`"throws
AssertionError"`).
This PR pins the **current** behavior with a characterization test and the
**intended** behavior with `pendingUntilFixed`, so the day the implementation
is corrected (`containsKey`) CI surfaces it via the
`pendingUntilFixed`-pass-now-fails signal and the characterization test starts
failing. The fix itself is out of scope for a test-coverage PR.
## Test plan
- [x] `WorkflowExecutionService/testOnly
org.apache.texera.amber.engine.architecture.controller.execution.OperatorExecutionSpec
org.apache.texera.amber.engine.architecture.controller.execution.RegionExecutionSpec`
— 28 succeed, 1 pending (`pendingUntilFixed` for the duplicate-id assertion)
- [x] `scalafmtCheckAll` — clean
- [ ] CI to confirm
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]