aglinxinyuan commented on code in PR #5707:
URL: https://github.com/apache/texera/pull/5707#discussion_r3411859097


##########
common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/DocumentFactory.scala:
##########
@@ -133,6 +133,29 @@ object DocumentFactory {
     }
   }
 
+  /**
+    * Return the document at `uri`: when `reuseExisting` is set and a document
+    * already exists there, open and return the existing one -- so a caller 
whose
+    * output accumulates across re-runs (e.g. a LoopEnd port whose region
+    * re-executes once per loop iteration) keeps the already-populated document
+    * instead of clobbering it, since `createDocument` overrides any existing
+    * document. Otherwise create it. Either way the caller gets the document, 
so
+    * the call site need not branch on create-vs-reuse.
+    *
+    * `exists` / `open` / `create` default to this object's own 
`documentExists`
+    * / `openDocument` / `createDocument`; they are parameterized only so the
+    * create-or-reuse decision can be unit-tested without an iceberg backend.
+    */
+  def createOrReuseDocument(
+      uri: URI,
+      schema: Schema,
+      reuseExisting: Boolean,
+      exists: URI => Boolean = documentExists,
+      open: URI => VirtualDocument[_] = (u: URI) => openDocument(u)._1,
+      create: (URI, Schema) => VirtualDocument[_] = createDocument
+  ): VirtualDocument[_] =
+    if (reuseExisting && exists(uri)) open(uri) else create(uri, schema)
+

Review Comment:
   Fair point, thanks. `createOrReuseDocument` is actually called by 
`RegionExecutionCoordinator` for every output port 
(RegionExecutionCoordinator.scala:604), but the guard just above asserts 
`reusesOutputStorage` is false, so today it always takes the *create* path — 
the reuse branch is exercised only by tests until the loop operators set the 
flag (the follow-up loop PR is its real consumer). I split the mechanism out 
ahead of the feature to keep each PR small and reviewable, but I agree an API 
reviews best alongside its consumer; I'll include the usage (or keep the API 
with its consumer) in future API-introducing PRs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to