Yicong-Huang commented on code in PR #5707:
URL: https://github.com/apache/texera/pull/5707#discussion_r3411568942
##########
common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/DocumentFactory.scala:
##########
@@ -133,6 +133,29 @@ object DocumentFactory {
}
}
+ /**
+ * Return the document at `uri`: when `reuseExisting` is set and a document
+ * already exists there, open and return the existing one -- so a caller
whose
+ * output accumulates across re-runs (e.g. a LoopEnd port whose region
+ * re-executes once per loop iteration) keeps the already-populated document
+ * instead of clobbering it, since `createDocument` overrides any existing
+ * document. Otherwise create it. Either way the caller gets the document,
so
+ * the call site need not branch on create-vs-reuse.
+ *
+ * `exists` / `open` / `create` default to this object's own
`documentExists`
+ * / `openDocument` / `createDocument`; they are parameterized only so the
+ * create-or-reuse decision can be unit-tested without an iceberg backend.
+ */
+ def createOrReuseDocument(
+ uri: URI,
+ schema: Schema,
+ reuseExisting: Boolean,
+ exists: URI => Boolean = documentExists,
+ open: URI => VirtualDocument[_] = (u: URI) => openDocument(u)._1,
+ create: (URI, Schema) => VirtualDocument[_] = createDocument
+ ): VirtualDocument[_] =
+ if (reuseExisting && exists(uri)) open(uri) else create(uri, schema)
+
Review Comment:
this API is not used except for tests. which makes it a dead src code.
as we discussed, it is better to review an API with its
callsite/consumer/user. we can let it go for this PR, as I could assume how
this API will be used from its signature. but for future ones please try to
include how an api is/will be used.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]