priyen commented on code in PR #14139:
URL: https://github.com/apache/pinot/pull/14139#discussion_r1793667639


##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark-3/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark3/SparkSegmentMetadataPushJobRunner.java:
##########
@@ -106,28 +192,123 @@ public void run() {
     } else {
       JavaSparkContext sparkContext = 
JavaSparkContext.fromSparkContext(SparkContext.getOrCreate());
       JavaRDD<String> pathRDD = sparkContext.parallelize(segmentsToPush, 
pushParallelism);
-      URI finalOutputDirURI = outputDirURI;
-      // Prevent using lambda expression in Spark to avoid potential 
serialization exceptions, use inner function
-      // instead.
-      pathRDD.foreach(new VoidFunction<String>() {
-        @Override
-        public void call(String segmentTarPath)
-            throws Exception {
-          PluginManager.get().init();
-          for (PinotFSSpec pinotFSSpec : pinotFSSpecs) {
-            PinotFSFactory
-                .register(pinotFSSpec.getScheme(), pinotFSSpec.getClassName(), 
new PinotConfiguration(pinotFSSpec));
+
+      if (_spec.getPushJobSpec().isBatchSegmentUpload()) {
+        // Process segments in batch mode using foreachPartition
+        pathRDD.foreachPartition(new VoidFunction<Iterator<String>>() {
+          @Override
+          public void call(Iterator<String> segmentIterator) throws Exception {
+            PluginManager.get().init();
+            setupFileSystems();
+
+            List<String> segmentsInPartition = new ArrayList<>();
+            segmentIterator.forEachRemaining(segmentsInPartition::add);
+
+            try {
+              Map<String, String> segmentUriToTarPathMap =
+                  SegmentPushUtils.getSegmentUriToTarPathMap(outputDirURI, 
_spec.getPushJobSpec(),
+                      segmentsInPartition.toArray(new String[0]));
+              SegmentPushUtils.sendSegmentUriAndMetadata(_spec, 
PinotFSFactory.create(outputDirURI.getScheme()),
+                  segmentUriToTarPathMap);
+            } catch (RetriableOperationException | AttemptsExceededException 
e) {
+              throw new RuntimeException(e);
+            }
           }
-          try {
-            Map<String, String> segmentUriToTarPathMap = SegmentPushUtils
-                .getSegmentUriToTarPathMap(finalOutputDirURI, 
_spec.getPushJobSpec(), new String[]{segmentTarPath});
-            SegmentPushUtils.sendSegmentUriAndMetadata(_spec, 
PinotFSFactory.create(finalOutputDirURI.getScheme()),
-                segmentUriToTarPathMap);
-          } catch (RetriableOperationException | AttemptsExceededException e) {
-            throw new RuntimeException(e);
+        });
+      } else {
+        // Process segments one by one using foreach
+        pathRDD.foreach(new VoidFunction<String>() {
+          @Override
+          public void call(String segmentTarPath) throws Exception {

Review Comment:
   yep, those are in the integration tests code that I added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to