mengw15 opened a new pull request, #5214:
URL: https://github.com/apache/texera/pull/5214
### What changes were proposed in this PR?
A Python UDF worker whose upstream produces zero tuples receives an
`EndChannel` marker before any data, so it never visits `RUNNING`. When
`_process_end_channel` then calls `complete()`, the state machine in `Context`
rejected the transition because `READY`'s allowed targets were `{PAUSED,
RUNNING}`, and the worker thread died with:
```
InvalidTransitionException: Cannot transit from READY to COMPLETED
```
The downstream operator then stayed stuck in `READY` and the workflow hung —
the user-visible symptom in #5197.
The Scala-side
[`WorkerStateManager.scala`](https://github.com/apache/texera/blob/main/amber/src/main/scala/org/apache/texera/amber/engine/common/statetransition/WorkerStateManager.scala#L36)
already lists `COMPLETED` in `READY`'s allowed targets:
```scala
READY -> Set(PAUSED, RUNNING, COMPLETED),
```
So this is purely a Python ↔ Scala parity drift. Add `WorkerState.COMPLETED`
to the Python `READY` set so a worker that never received any tuples can
complete cleanly.
While there, lift the state-transition graph out of `Context.__init__` into
a module-level `WORKER_STATE_TRANSITIONS` constant so:
- The test fixture imports it (single source of truth — the previous fixture
independently duplicated the graph, which is what masked the parity gap from
existing tests).
- The graph is built once per process instead of once per `Context` instance.
### Any related issues, documentation, discussions?
Closes #5197.
### How was this PR tested?
Added a new regression case in `test_state_manager.py`:
```python
def test_it_can_transit_directly_from_ready_to_completed(self,
state_manager):
state_manager.transit_to(WorkerState.READY)
state_manager.transit_to(WorkerState.COMPLETED)
state_manager.assert_state(WorkerState.COMPLETED)
```
The fixture now imports the production `WORKER_STATE_TRANSITIONS` constant
from `context.py`, so future drift between the test graph and the production
graph is impossible.
Run locally:
```
python -m pytest amber/src/test/python/core/architecture/managers/ -v
```
Result: `89 passed` (4 existing `TestStateManager` cases + 1 new + 84 other
manager tests).
`ruff check` and `ruff format --check` both clean on the touched files.
Manually verified the issue's reproducer workflow (1-out PythonUDF producing
0 tuples → downstream PythonUDF) completes cleanly after the fix; previously it
hung with the worker stuck in `READY`.
### Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (claude-opus-4-7)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]