[GitHub] [hive] szlta opened a new pull request #3060: HIVE-25975: Optimize ClusteredWriter for bucketed Iceberg tables

GitBox Mon, 28 Feb 2022 08:48:36 -0800


szlta opened a new pull request #3060:
URL: https://github.com/apache/hive/pull/3060



   This adds a new UDF that uses Iceberg's bucket transformation function to 
produce bucket values from constants or any column input. All types that 
Iceberg buckets support are supported in this UDF too, except for UUID.
   
   This UDF is then used in SortedDynPartitionOptimizer to sort data during 
write if the target Iceberg target has bucket transform partitioning. 
   
   To enable this, Hive has been extended with the feature that allows storage 
handlers to define custom sorting expressions, to be passed to FileSink 
operator's DynPartContext during dynamic partitioning write scenarios.
   
   The lenient version of ClusteredWriter in patched-iceberg-core has been 
disposed of as it is not needed anymore with this feature in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] szlta opened a new pull request #3060: HIVE-25975: Optimize ClusteredWriter for bucketed Iceberg tables

Reply via email to