bhat-vinay commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2019436395
> We have #7146 which also attempted to solve the same problem. Should we close #7146 and prefer this one? That does not solve the problem as the sorting (of the input batch) is thrown away by the hashing based mapping of the record to a specific bucket. This tries to solve the problem by implementing a new partitioner `UpsertSortPartitioner`, derived from `UpsertPartitioner`, which preserves the sorted nature of the input batch (by assigning a contiguous range of sorted input records to a single bucket/spark-partition) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
