priyen commented on code in PR #14139:
URL: https://github.com/apache/pinot/pull/14139#discussion_r1792190115
##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark-3/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark3/SparkSegmentMetadataPushJobRunner.java:
##########
@@ -175,28 +192,52 @@ private void handleNonConsistentPush(List<String>
segmentsToPush, PinotFS output
} else {
JavaSparkContext sparkContext =
JavaSparkContext.fromSparkContext(SparkContext.getOrCreate());
JavaRDD<String> pathRDD = sparkContext.parallelize(segmentsToPush,
pushParallelism);
- // Prevent using lambda expression in Spark to avoid potential
serialization exceptions, use inner function
- // instead.
- pathRDD.foreach(new VoidFunction<String>() {
- @Override
- public void call(String segmentTarPath)
- throws Exception {
- PluginManager.get().init();
- setupFileSystems();
- try {
- Map<String, String> segmentUriToTarPathMap =
- SegmentPushUtils.getSegmentUriToTarPathMap(outputDirURI,
_spec.getPushJobSpec(),
- new String[]{segmentTarPath});
- SegmentPushUtils.sendSegmentUriAndMetadata(_spec,
PinotFSFactory.create(outputDirURI.getScheme()),
- segmentUriToTarPathMap);
- } catch (RetriableOperationException | AttemptsExceededException e) {
- throw new RuntimeException(e);
+
+ if (_spec.getPushJobSpec().isBatchSegmentUpload()) {
+ // Process segments in batch mode using foreachPartition
+ pathRDD.foreachPartition(new VoidFunction<Iterator<String>>() {
Review Comment:
myself/[swaminathanmanish](https://github.com/swaminathanmanish)/@rajagopr
discussed this in slack and we are happy with the current implementation in
this PR since it gives users the flexibility to do it all in 1 go (push
parallelism == 1) or increase it if the scale is too large to do in 1 go.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]