advancedxy opened a new issue, #8258: URL: https://github.com/apache/iceberg/issues/8258
### Feature Request / Improvement As discussed in #5626, it would be nice to have multi-arg transform supported in Iceberg, especially for bucket transform. I wrote up a design doc for this improvement: https://docs.google.com/document/d/1aDoZqRgvDOOUVAGhvKZbp5vFstjsAMY4EFCyjlxpaaw/edit?usp=sharing Quoted background from the doc: > Iceberg uses a transform to produce partitioning value from a source value. Currently the supported transforms are: `Years`, `Months`, `Days`, `Hours`, `Identity`, `Void`, `Truncate`, `Bucket`. Since the current spec requires that each partitioning field consists of a source column id in the table’s schema, the above transforms only accept one argument as its input. However, it’s possible and quite common to use multiple arguments to produce a partitioning value, especially for the `Bucket` transform. Other transforms might require multiple arguments in the future. This document tries to add multi-arg transform support in Iceberg, especially for the bucket transform. ------ I also did a poc version of how multiple arg bucket would be supported in Spark: . Some places are not modified yet, such as UpdatePartitionSpec, TableMetadata related. I'd like to get feedbacks from the community before going too much further. If we have reached the consensus that we should support multi-arg transform and the spec changes are stabilized after reviewing. I would update my code accordingly, and extend the Flink engine support. ### Query engine Spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
