alexeykudinkin commented on code in PR #7374:
URL: https://github.com/apache/hudi/pull/7374#discussion_r1042729026
##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieDatasetBulkInsertHelper.scala:
##########
@@ -150,8 +151,8 @@ object HoodieDatasetBulkInsertHelper extends Logging {
}
writer.getWriteStatuses.asScala.map(_.toWriteStatus).iterator
- }).collect()
- table.getContext.parallelize(writeStatuses.toList.asJava)
Review Comment:
@Zouxxyy in this case we should actually not be relying on persist as a way
to avoid double execution, since persisting is essentially just a caching
mechanism (re-using cached blocks on executors) and it'd not be relied upon (it
could fail at any point if, for ex, one of the executors fail, making you
recompute whole RDD)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]