Yicong-Huang commented on code in PR #5707:
URL: https://github.com/apache/texera/pull/5707#discussion_r3423786315
##########
amber/src/main/scala/org/apache/texera/amber/engine/architecture/scheduling/RegionExecutionCoordinator.scala:
##########
@@ -576,8 +576,33 @@ class RegionExecutionCoordinator(
region.getOperator(outputPortId.opId).outputPorts(outputPortId.portId)._3
val schema =
schemaOptional.getOrElse(throw new IllegalStateException("Schema is
missing"))
- DocumentFactory.createDocument(resultURI, schema)
- DocumentFactory.createDocument(stateURI, State.schema)
+ // An output port whose storage accumulates across region re-executions
+ // (e.g. a LoopEnd port, whose output builds up over the iterations of
+ // its own loop) sets `reusesOutputStorage`, so the existing document
is
+ // preserved rather than clobbered by `createDocument`
(overrideIfExists
Review Comment:
let's reduce talking about "rather than", they maybe introduced due to
review of alternative solutions. I think comments better describe the final
state.
##########
common/workflow-core/src/main/protobuf/org/apache/texera/amber/core/workflow.proto:
##########
@@ -62,6 +62,11 @@ message OutputPort {
string displayName = 2;
bool blocking = 3;
OutputMode mode = 4;
+ // Whether storage at this port is reused (kept and appended to) rather than
+ // recreated when the owning operator's region re-executes -- e.g. a LoopEnd
+ // port whose output accumulates across the iterations of its own loop. The
+ // region scheduler reads this when provisioning the port's output document.
+ bool reusesOutputStorage = 5;
Review Comment:
nit: can just be `reuseStorage` as it is already on output port.
##########
common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/DocumentFactory.scala:
##########
@@ -133,6 +133,29 @@ object DocumentFactory {
}
}
+ /**
+ * Return the document at `uri`: when `reuseExisting` is set and a document
+ * already exists there, open and return the existing one -- so a caller
whose
+ * output accumulates across re-runs (e.g. a LoopEnd port whose region
+ * re-executes once per loop iteration) keeps the already-populated document
+ * instead of clobbering it, since `createDocument` overrides any existing
+ * document. Otherwise create it. Either way the caller gets the document,
so
+ * the call site need not branch on create-vs-reuse.
Review Comment:
remove "Either way the caller gets the document, so the call site need not
branch on create-vs-reuse."
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]