[PR] fix(python): allow worker state to transit READY -> COMPLETED [texera]

via GitHub Mon, 25 May 2026 16:41:37 -0700


mengw15 opened a new pull request, #5214:
URL: https://github.com/apache/texera/pull/5214


   ### What changes were proposed in this PR?
   
   A Python UDF worker whose upstream produces zero tuples receives an 
`EndChannel` marker before any data, so it never visits `RUNNING`. When 
`_process_end_channel` then calls `complete()`, the state machine in `Context` 
rejected the transition because `READY`'s allowed targets were `{PAUSED, 
RUNNING}`, and the worker thread died with:
   
   ```
   InvalidTransitionException: Cannot transit from READY to COMPLETED
   ```
   
   The downstream operator then stayed stuck in `READY` and the workflow hung — 
the user-visible symptom in #5197.
   
   The Scala-side 
[`WorkerStateManager.scala`](https://github.com/apache/texera/blob/main/amber/src/main/scala/org/apache/texera/amber/engine/common/statetransition/WorkerStateManager.scala#L36)
 already lists `COMPLETED` in `READY`'s allowed targets:
   
   ```scala
   READY -> Set(PAUSED, RUNNING, COMPLETED),
   ```
   
   So this is purely a Python ↔ Scala parity drift. Add `WorkerState.COMPLETED` 
to the Python `READY` set so a worker that never received any tuples can 
complete cleanly.
   
   While there, lift the state-transition graph out of `Context.__init__` into 
a module-level `WORKER_STATE_TRANSITIONS` constant so:
   
   - The test fixture imports it (single source of truth — the previous fixture 
independently duplicated the graph, which is what masked the parity gap from 
existing tests).
   - The graph is built once per process instead of once per `Context` instance.
   
   ### Any related issues, documentation, discussions?
   
   Closes #5197.
   
   ### How was this PR tested?
   
   Added a new regression case in `test_state_manager.py`:
   
   ```python
   def test_it_can_transit_directly_from_ready_to_completed(self, 
state_manager):
       state_manager.transit_to(WorkerState.READY)
       state_manager.transit_to(WorkerState.COMPLETED)
       state_manager.assert_state(WorkerState.COMPLETED)
   ```
   
   The fixture now imports the production `WORKER_STATE_TRANSITIONS` constant 
from `context.py`, so future drift between the test graph and the production 
graph is impossible.
   
   Run locally:
   
   ```
   python -m pytest amber/src/test/python/core/architecture/managers/ -v
   ```
   
   Result: `89 passed` (4 existing `TestStateManager` cases + 1 new + 84 other 
manager tests).
   
   `ruff check` and `ruff format --check` both clean on the touched files.
   
   Manually verified the issue's reproducer workflow (1-out PythonUDF producing 
0 tuples → downstream PythonUDF) completes cleanly after the fix; previously it 
hung with the worker stuck in `READY`.
   
   ### Was this PR authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (claude-opus-4-7)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] fix(python): allow worker state to transit READY -> COMPLETED [texera]

Reply via email to