Yicong-Huang opened a new issue, #4514:
URL: https://github.com/apache/texera/issues/4514

   ### What happened?
   
   `ResumeHandler.resumeWorkflow` 
(`amber/src/main/scala/org/apache/texera/amber/engine/architecture/controller/promisehandlers/ResumeHandler.scala:60-70`)
 finishes by emitting `ExecutionStatsUpdate` and `RuntimeStatisticsPersist` to 
the client, but does **not** emit 
`ExecutionStateUpdate(cp.workflowExecution.getState)`.
   
   This is asymmetric with `PauseHandler.scala:88-90`, which does emit the 
state update at the end of pause.
   
   ```scala
   // PauseHandler (correct)
   sendToClient(ExecutionStatsUpdate(stats))
   sendToClient(RuntimeStatisticsPersist(stats))
   sendToClient(ExecutionStateUpdate(cp.workflowExecution.getState))   // ← 
present
   
   // ResumeHandler (missing)
   sendToClient(ExecutionStatsUpdate(stats))
   sendToClient(RuntimeStatisticsPersist(stats))
   // ← no ExecutionStateUpdate
   ```
   
   The data is already correct on the server side: each `resumeWorker(...)` 
returns a `WorkerStateResponse` and `ResumeHandler` updates its internal 
`WorkerExecution` state with that result, so `cp.workflowExecution.getState` 
correctly returns `RUNNING` after the resume futures complete. The state simply 
never gets broadcast to clients.
   
   **Expected:** after resume completes, clients receive 
`ExecutionStateUpdate(RUNNING)` (mirroring how pause produces 
`ExecutionStateUpdate(PAUSED)`).
   **Actual:** no `ExecutionStateUpdate` is sent.
   
   Consequences:
   - `ExecutionStatsService` and `ExecutionConsoleService` (the existing 
consumers) miss the resume → RUNNING transition unless they observe it 
indirectly via stats.
   - Tests that try to wait for "workflow is running again after resume" cannot 
use the `ExecutionStateUpdate` callback path used elsewhere; they fall back to 
`Thread.sleep` (e.g. `PauseSpec.shouldPause`).
   
   ### How to reproduce?
   
   1. Start a workflow via `client.controllerInterface.startWorkflow(...)`.
   2. Register `client.registerCallback[ExecutionStateUpdate](evt => ...)`.
   3. Call `client.controllerInterface.pauseWorkflow(...)` — observe that 
`ExecutionStateUpdate(PAUSED)` fires.
   4. Call `client.controllerInterface.resumeWorkflow(...)` — observe that 
**no** `ExecutionStateUpdate` event fires, even though the aggregated state has 
transitioned back to `RUNNING`.
   
   A minimal in-tree repro: `PauseSpec.shouldPause` already had to insert 
`Thread.sleep(400)` after resume because no state event arrives.
   
   ### Version
   
   1.1.0-incubating (Pre-release/Master)
   
   ### Commit Hash (Optional)
   
   e635dd027d
   
   ### Relevant log output
   
   N/A — the bug is the absence of an event, not a faulty one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to