Re: [PR] Implement consistent data push for spark 3 segment generation and metadata push jobs [pinot]

via GitHub Wed, 09 Oct 2024 07:53:17 -0700


priyen commented on code in PR #14139:
URL: https://github.com/apache/pinot/pull/14139#discussion_r1793676490



##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark-3/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark3/SparkSegmentMetadataPushJobRunner.java:
##########
@@ -106,28 +192,123 @@ public void run() {
     } else {
       JavaSparkContext sparkContext = 
JavaSparkContext.fromSparkContext(SparkContext.getOrCreate());
       JavaRDD<String> pathRDD = sparkContext.parallelize(segmentsToPush, 
pushParallelism);
-      URI finalOutputDirURI = outputDirURI;
-      // Prevent using lambda expression in Spark to avoid potential 
serialization exceptions, use inner function
-      // instead.
-      pathRDD.foreach(new VoidFunction<String>() {
-        @Override
-        public void call(String segmentTarPath)
-            throws Exception {
-          PluginManager.get().init();
-          for (PinotFSSpec pinotFSSpec : pinotFSSpecs) {
-            PinotFSFactory
-                .register(pinotFSSpec.getScheme(), pinotFSSpec.getClassName(), 
new PinotConfiguration(pinotFSSpec));
+
+      if (_spec.getPushJobSpec().isBatchSegmentUpload()) {

Review Comment:
   yes thats right, I use foreachPartition when in batch upload = true and 
forEach when it's false. 
   The code looks the same but the difference is `segmentUriToTarPathMap` has 
multiple segments when in batch and only 1 when in non-batch



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Implement consistent data push for spark 3 segment generation and metadata push jobs [pinot]

Reply via email to