priyen commented on code in PR #14139:
URL: https://github.com/apache/pinot/pull/14139#discussion_r1793668591
##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark-3/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark3/SparkSegmentMetadataPushJobRunner.java:
##########
@@ -41,16 +50,45 @@
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.VoidFunction;
+import org.apache.spark.scheduler.JobFailed;
+import org.apache.spark.scheduler.JobResult;
+import org.apache.spark.scheduler.SparkListener;
+import org.apache.spark.scheduler.SparkListenerJobEnd;
public class SparkSegmentMetadataPushJobRunner implements IngestionJobRunner,
Serializable {
- private SegmentGenerationJobSpec _spec;
+ // This listener is added to the SparkContext and is executed when the Spark
job fails.
+ // It handles the failure by calling
ConsistentDataPushUtils.handleUploadException.
+ // The listener is only added if consistent data push is enabled and the
pushParallelism is greater than 1.
+ // This listener is not a required part of the implementation, as the start
replace segments protocol
+ // will cleanup past failures as part of the fresh consistent data push, but
it's still cleaner to handle
+ // the failure as soon as possible.
+ private static class ConsistentDataPushFailureHandler extends SparkListener {
+ private final SegmentGenerationJobSpec _spec;
+ private final Map<URI, String> _uriToLineageEntryIdMap;
- public SparkSegmentMetadataPushJobRunner() {
+ public ConsistentDataPushFailureHandler(SegmentGenerationJobSpec spec,
Map<URI, String> uriToLineageEntryIdMap) {
Review Comment:
no since it is inside the failure path we cant replicate; but i'm manually
tested this by running spark job where it will fail
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]