[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7374: [HUDI-5327] Fix spark stages when using row writer

GitBox Wed, 07 Dec 2022 14:06:43 -0800


alexeykudinkin commented on code in PR #7374:
URL: https://github.com/apache/hudi/pull/7374#discussion_r1042729026



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieDatasetBulkInsertHelper.scala:
##########
@@ -150,8 +151,8 @@ object HoodieDatasetBulkInsertHelper extends Logging {
       }
 
       writer.getWriteStatuses.asScala.map(_.toWriteStatus).iterator
-    }).collect()
-    table.getContext.parallelize(writeStatuses.toList.asJava)

Review Comment:
   @Zouxxyy in this case we should actually not be relying on persist as a way 
to avoid double execution, since persisting is essentially just a caching 
mechanism (re-using cached blocks on executors) and it'd not be relied upon (it 
could fail at any point if, for ex, one of the executors fail, making you 
recompute whole RDD)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7374: [HUDI-5327] Fix spark stages when using row writer

Reply via email to