aglinxinyuan opened a new issue, #5709: URL: https://github.com/apache/texera/issues/5709
### Feature Summary When a region re-executes, `RegionExecutionCoordinator` recreates each operator's iceberg output and state documents. An operator whose output **accumulates across runs** — e.g. a loop body that re-executes once per iteration — needs its existing storage preserved instead of clobbered on every region invocation. There is currently no way for an operator to opt into reusing its output storage across re-executions. ### Proposed Solution or Design - Add a `reusesOutputStorageOnReExecution: Boolean = false` field to `PhysicalOp` plus a `withReusesOutputStorageOnReExecution` builder. - Add a pure `provisionOutputDocument(uri, reuseExistingStorage, documentExists, createDocument)` decision function in `RegionExecutionCoordinator`: when the owning operator sets the flag and the document already exists, reuse it; otherwise (re)create. Unit-test the reuse×exists matrix. - Name the flag for the behavior the scheduler checks, not the operator that sets it, so any future operator needing the same treatment can opt in. Default `false` keeps every existing operator recreating its storage as before (no behavior change). The for-loop feature sets the flag on Loop End. ### Affected Area - Workflow Engine (Amber) - Storage / Metadata -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
