aglinxinyuan opened a new issue, #5709:
URL: https://github.com/apache/texera/issues/5709

   ### Feature Summary
   
   When a region re-executes, `RegionExecutionCoordinator` recreates each 
operator's iceberg output and state documents. An operator whose output 
**accumulates across runs** — e.g. a loop body that re-executes once per 
iteration — needs its existing storage preserved instead of clobbered on every 
region invocation. There is currently no way for an operator to opt into 
reusing its output storage across re-executions.
   
   ### Proposed Solution or Design
   
   - Add a `reusesOutputStorageOnReExecution: Boolean = false` field to 
`PhysicalOp` plus a `withReusesOutputStorageOnReExecution` builder.
   - Add a pure `provisionOutputDocument(uri, reuseExistingStorage, 
documentExists, createDocument)` decision function in 
`RegionExecutionCoordinator`: when the owning operator sets the flag and the 
document already exists, reuse it; otherwise (re)create. Unit-test the 
reuse×exists matrix.
   - Name the flag for the behavior the scheduler checks, not the operator that 
sets it, so any future operator needing the same treatment can opt in.
   
   Default `false` keeps every existing operator recreating its storage as 
before (no behavior change). The for-loop feature sets the flag on Loop End.
   
   ### Affected Area
   
   - Workflow Engine (Amber)
   - Storage / Metadata


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to