aglinxinyuan commented on code in PR #4206:
URL: https://github.com/apache/texera/pull/4206#discussion_r3308683989


##########
amber/src/main/scala/org/apache/texera/amber/engine/architecture/scheduling/RegionExecutionCoordinator.scala:
##########
@@ -576,8 +576,17 @@ class RegionExecutionCoordinator(
           
region.getOperator(outputPortId.opId).outputPorts(outputPortId.portId)._3
         val schema =
           schemaOptional.getOrElse(throw new IllegalStateException("Schema is 
missing"))
-        DocumentFactory.createDocument(resultURI, schema)
-        DocumentFactory.createDocument(stateURI, State.schema)
+        // LoopEnd operators may re-execute the region multiple times; on
+        // subsequent iterations the result/state documents already exist,
+        // and `createDocument` (overrideIfExists=true) would clobber them.
+        // Skip the create call when the document is already there.
+        val isLoopEndRegion = region.getOperators.exists(_.isLoopEnd)
+        if (!isLoopEndRegion || !DocumentFactory.documentExists(resultURI)) {
+          DocumentFactory.createDocument(resultURI, schema)
+        }
+        if (!isLoopEndRegion || !DocumentFactory.documentExists(stateURI)) {
+          DocumentFactory.createDocument(stateURI, State.schema)
+        }

Review Comment:
   Added coverage in ca9e5ce8cc.
   
   Extracted the per-document create-or-reuse decision out of the private 
`createOutputPortStorageObjects` into a testable 
`RegionExecutionCoordinator.provisionOutputDocument(uri, schema, 
reuseIfExists)`, then added `RegionExecutionCoordinatorSpec` exercising the 
full truth table against a real iceberg-backed `DocumentFactory`:
   
   | reuseIfExists | doc state before | expected after |
   |---|---|---|
   | false | absent | empty doc created |
   | true | absent (1st iteration) | empty doc created |
   | **true** | **has 3 rows** | **3 rows preserved** ← the load-bearing 
invariant |
   | false | has 3 rows | wiped to 0 (contrast case) |
   
   The third case is the one you flagged: it proves a re-executing loop region 
does **not** wipe the output its previous iterations accumulated. The fourth 
shows the non-reuse path still starts fresh, so the guard is actually doing 
something. No behavior change — `createOutputPortStorageObjects` just delegates 
to the helper now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to