Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

via GitHub Wed, 27 Mar 2024 15:11:18 -0700


yihua commented on PR #10876:
URL: https://github.com/apache/hudi/pull/10876#issuecomment-2024077244


   > > We have #7146 which also attempted to solve the same problem. Should we 
close #7146 and prefer this one?
   > 
   > That does not solve the problem as the sorting (of the input batch) is 
thrown away by the hashing based mapping of the record to a specific bucket. 
This tries to solve the problem by implementing a new partitioner 
`UpsertSortPartitioner`, derived from `UpsertPartitioner`, which preserves the 
sorted nature of the input batch (by assigning a contiguous range of sorted 
input records to a single bucket/spark-partition)
   
   Then #7146 can be deprecated?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

Reply via email to