[GitHub] [iceberg] advancedxy opened a new issue, #8258: multi-arg transform support

via GitHub Tue, 08 Aug 2023 08:01:27 -0700


advancedxy opened a new issue, #8258:
URL: https://github.com/apache/iceberg/issues/8258

### Feature Request / Improvement

As discussed in #5626, it would be nice to have multi-arg transform
supported in Iceberg, especially for bucket transform.

I wrote up a design doc for this improvement:
https://docs.google.com/document/d/1aDoZqRgvDOOUVAGhvKZbp5vFstjsAMY4EFCyjlxpaaw/edit?usp=sharing

Quoted background from the doc:
> Iceberg uses a transform to produce partitioning value from a source
value. Currently the supported transforms are: `Years`, `Months`, `Days`,
`Hours`, `Identity`, `Void`, `Truncate`, `Bucket`. Since the current spec
requires that each partitioning field consists of a source column id in the
table’s schema, the above transforms only accept one argument as its input.
However, it’s possible and quite common to use multiple arguments to produce a
partitioning value, especially for the `Bucket` transform. Other transforms
might require multiple arguments in the future. This document tries to add
multi-arg transform support in Iceberg, especially for the bucket transform.

------
I also did a poc version of how multiple arg bucket would be supported in
Spark: . Some places are not modified yet, such as UpdatePartitionSpec,
TableMetadata related.
I'd like to get feedbacks from the community before going too much further.

If we have reached the consensus that we should support multi-arg transform
and the spec changes are stabilized after reviewing. I would update my code
accordingly, and extend the Flink engine support.

### Query engine

Spark

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] advancedxy opened a new issue, #8258: multi-arg transform support

Reply via email to