Re: [PR] Spark Runner : Replace queueStream with custom DStream in Spark streaming Flatten transform [beam]

via GitHub Fri, 28 Feb 2025 22:13:46 -0800


twosom commented on code in PR #34080:
URL: https://github.com/apache/beam/pull/34080#discussion_r1976322756



##########
runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkPipelineOptions.java:
##########
@@ -48,6 +48,12 @@ public interface TestSparkPipelineOptions extends 
SparkPipelineOptions, TestPipe
 
   void setStopPipelineWatermark(Long stopPipelineWatermark);
 
+  @Description("Whether to delete the checkpoint directory after the pipeline 
execution.")
+  @Default.Boolean(true)
+  boolean isDeleteCheckpointDir();
+
+  void setDeleteCheckpointDir(boolean deleteCheckpointDir);

Review Comment:
   By default, TestSparkRunner is configured to delete the checkpoint directory 
after a job is completed. This behavior can be observed in the Apache Beam 
source code at the following link:
   
   
https://github.com/apache/beam/blob/468001b3a09297fa4fc721d9520654ffd4afb7dc/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java#L110.
   
   However, for the test scenario in question, when running two consecutive 
jobs using TestSparkRunner, it is necessary to retain the checkpoint directory 
from the first job instead of allowing it to be deleted. This ensures that the 
subsequent job can utilize the checkpoint data as needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Spark Runner : Replace queueStream with custom DStream in Spark streaming Flatten transform [beam]

Reply via email to