LsomeYeah opened a new pull request, #6826:
URL: https://github.com/apache/paimon/pull/6826

   <!-- Please specify the module before the PR name: [core] ... or [flink] ... 
-->
   
   ### Purpose
   
   <!-- Linking this pull request to the issue -->
   Linked issue: close #xxx
   
   <!-- What is the purpose of the change -->
   As https://github.com/apache/paimon/pull/2749 mentioned, if the sizes of 
records vary significantly, range partitioning based on the number of records 
may cause data skew, which further reduces the processing efficiency of 
individual concurrency. In extreme cases, especially in Flink scenarios, this 
could fill up the TaskManager's local disk and cause task failure. 
   
   Considering record size during range partitioning can alleviate this issue. 
This PR sets `SIZE` as the default strategy for range partitioning.
   
   
   ### Tests
   
   <!-- List UT and IT cases to verify this change -->
   
   ### API and Format
   
   <!-- Does this change affect API or storage format -->
   
   ### Documentation
   
   <!-- Does this change introduce a new feature -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to