twosom commented on code in PR #34080:
URL: https://github.com/apache/beam/pull/34080#discussion_r1976322756
##########
runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkPipelineOptions.java:
##########
@@ -48,6 +48,12 @@ public interface TestSparkPipelineOptions extends
SparkPipelineOptions, TestPipe
void setStopPipelineWatermark(Long stopPipelineWatermark);
+ @Description("Whether to delete the checkpoint directory after the pipeline
execution.")
+ @Default.Boolean(true)
+ boolean isDeleteCheckpointDir();
+
+ void setDeleteCheckpointDir(boolean deleteCheckpointDir);
Review Comment:
By default, TestSparkRunner is configured to delete the checkpoint directory
after a job is completed. This behavior can be observed in the Apache Beam
source code at the following link:
https://github.com/apache/beam/blob/468001b3a09297fa4fc721d9520654ffd4afb7dc/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java#L110.
However, for the test scenario in question, when running two consecutive
jobs using TestSparkRunner, it is necessary to retain the checkpoint directory
from the first job instead of allowing it to be deleted. This ensures that the
subsequent job can utilize the checkpoint data as needed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]