aglinxinyuan opened a new pull request, #5707: URL: https://github.com/apache/texera/pull/5707
Split out of #5700 (loop operators) per [reviewer request](https://github.com/apache/texera/pull/4206#pullrequestreview-4482667715) to keep that PR reviewable. Independent (~30 lines + a unit test) and dormant until the loop feature uses it. ## What - `PhysicalOp` gains a `reusesOutputStorageOnReExecution: Boolean = false` field + a `withReusesOutputStorageOnReExecution` builder. - `RegionExecutionCoordinator` gains a pure `provisionOutputDocument(uri, reuseExistingStorage, documentExists, createDocument)` decision function, used per output port to decide create-vs-reuse based on the owning operator's flag. - New `RegionOutputProvisioningSpec` unit-tests the decision function (the reuse×exists matrix plus the "no-reuse never probes existence" short-circuit). ## Why When a region re-executes (e.g. a loop body), an operator whose output accumulates across runs must not have its iceberg output/state documents clobbered on every invocation. The flag lets such an operator preserve its storage; every other operator keeps recreating as before. It is named for the behavior the scheduler checks, not the operator that sets it, so it is reusable. ## Impact | operator flag | region re-run behavior | |---|---| | `false` (every operator today) | recreate output/state documents — unchanged | | `true` (set by Loop End in the loop PR) | reuse existing documents | Dormant and behavior-preserving — no operator sets the flag in this PR. ## Test `sbt "WorkflowExecutionService/testOnly *RegionOutputProvisioningSpec"` — 5 passing; compiles clean; scalafmt clean. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
