bobbai00 opened a new issue, #4576:
URL: https://github.com/apache/texera/issues/4576

   ### What happened?
   
   When a HashJoin operator is executed through the sync execution API (`POST 
/api/execution/{wid}/{cuid}/run`, used by `agent-service`), it consistently 
returns an empty result. The build phase appears to finish, then the execution 
terminates before the probe phase produces output.
   
   Expected: HashJoin returns the joined rows, same as a normal frontend 
execution.
   
   ### How to reproduce?
   
   1. Build a workflow with a HashJoin (any two upstream sources + HashJoin).
   2. Trigger execution through the agent (`executeOperator` tool) targeting 
the HashJoin, or call `/api/execution/{wid}/{cuid}/run` directly with 
`targetOperatorIds = [hashJoinId]`.
   3. The response comes back with `success: true`, `state: "Completed"`, and 
`outputTuples: 0`.
   
   ### Version
   
   1.1.0-incubating (Pre-release/Master)
   
   ### Commit Hash (Optional)
   
   af313e7160488f2ca5ae25bf88277b37a6bf5c08
   
   ### Possible cause
   
   `HashJoinOpDesc.getPhysicalPlan` produces two PhysicalOps (`build`, `probe`) 
sharing one logical id, separated by a blocking edge. The scheduler places them 
in two regions and runs them sequentially.
   
   `SyncExecutionResource.allTargetsCompleted` checks 
`stats.operatorInfo.get(opId).operatorState == COMPLETED`, where `operatorInfo` 
is produced by `WorkflowExecution.getAllRegionExecutionsStats`. That method 
aggregates by `logicalOpId.id` over only the *registered* `RegionExecution`s. 
Between "build region completed" and "probe region instantiated," only the 
build PhysicalOp is registered, so `aggregateStates(Iterable(COMPLETED))` 
returns `COMPLETED`. The sync resource then takes the `TargetResultsReady` 
branch, kills the execution, and reads the probe's (still-empty) output storage.
   
   The same race applies to any logical operator that compiles to multiple 
PhysicalOps separated by a blocking edge (e.g. Aggregate). It does not surface 
in normal frontend execution because the frontend waits for full workflow 
termination instead of per-target completion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to