The GitHub Actions job "Required Checks" on 
texera.git/gh-readonly-queue/main/pr-5707-b7fe06e9c22a03b4ac5ce201e9b7eb244cbc20eb
 has succeeded.
Run started by GitHub user aglinxinyuan (triggered by aglinxinyuan).

Head commit for run:
226ebde6d372d2e55d0c3a988ffd765288b4faa5 / Xinyuan Lin <[email protected]>
feat(scheduling): reuse output storage across region re-executions (#5707)

### What changes were proposed in this PR?

Adds an opt-in mechanism for an output port to **reuse** its storage
when the owning operator's region re-executes, instead of recreating the
document each time. Dormant and behavior-preserving — no operator sets
the flag in this PR.

- `OutputPort` gains a `reuseStorage: Boolean` proto field (alongside
`blocking` / `mode`). It marks a port whose output accumulates across
region re-executions — e.g. a Loop End port whose result builds up over
the iterations of its own loop.
- `DocumentFactory.createOrReuseDocument(uri, schema, reuseExisting, …)`
is the create-or-reuse decision: when reuse is requested and a document
already exists it opens and returns that one; otherwise it creates a
fresh one. It always returns the document, so the call site does not
branch.
- `RegionExecutionCoordinator` reads each output port's `reuseStorage`
flag while provisioning that port's result/state documents and routes
through `createOrReuseDocument`.

| port flag | region re-run behavior |
|---|---|
| `false` (every operator today) | recreate output/state documents —
unchanged |
| `true` (set by Loop End in the loop PR) | keep and reopen the existing
documents |

A runtime guard in `RegionExecutionCoordinator` asserts no port sets
`reuseStorage` for now: the flag activates only with the loop operators,
which are not yet on `main`. The guard keeps the dormant reuse path from
being silently exercised before its consumer exists, and is removed when
the loop operators land.

### Any related issues, documentation, discussions?

Resolves #5709 (sub-issue of #4442 "Introduce for loop"). Split out of
#5700 to keep that PR reviewable, per @Xiao-zhen-Liu's
[review](https://github.com/apache/texera/pull/4206#pullrequestreview-4482667715).

### How was this PR tested?

- `DocumentFactorySpec` — pins the create-or-reuse decision (the reuse ×
exists matrix plus the "no-reuse never probes existence" short-circuit)
with injected document stubs, no iceberg backend.
- `OutputPortReuseFlagSpec` — guards that no registered operator enables
`reuseStorage` on any output port.
- `WorkflowCore` / `WorkflowOperator` / `WorkflowExecutionService`
compile; scalafmt + scalafix clean.

### Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.8 in compliance with ASF.

Report URL: https://github.com/apache/texera/actions/runs/27654453057

With regards,
GitHub Actions via GitBox

Reply via email to