bhat-vinay commented on code in PR #10876:
URL: https://github.com/apache/hudi/pull/10876#discussion_r1535704794
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java:
##########
@@ -230,6 +236,10 @@ protected Partitioner getPartitioner(WorkloadProfile
profile) {
}
private HoodieData<WriteStatus>
mapPartitionsAsRDD(HoodieData<HoodieRecord<T>> dedupedRecords, Partitioner
partitioner) {
+ if (operationRequiresSorting()) {
Review Comment:
What does sorting mean for 'upsert' operation. If the record is really being
updated, wont there be a index lookup which routes the record to its specific
filegroup? Or is there benefit of supporting sorting when an upsert batch
contains new records that are getting written for the first time? This PR
allows sorting only for INSERT operation.
`BaseSparkCommitActionExecutor::operationRequiresSorting(...)` takes care of
that. If the config needs to be made ambiguity-proof for future usecases,
should I rename it to `WRITE_SORT_MODE`, `WRITE_SORT_OPERATIONS` and
`WRITE_USER_DEFINED_PARTITIONER_SORT_COLUMNS`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]