hantangwangd opened a new issue, #13886: URL: https://github.com/apache/iceberg/issues/13886
### Feature Request / Improvement Currently, constructing a PartitionSpec with multiple redundant transforms on the same column (e.g., day(ts), month(ts)) fails as follows: ``` Cannot add redundant partition: 1000: ts_day: day(3) conflicts with 1001: ts_month: month(3) ``` However, when using `UpdatePartitionSpec` to update a PartitionSpec, this validation is somewhat absent. We can successfully add multiple redundant transforms for a source column (with the only restriction being that multiple cannot be added in a single operation). This allows a computation engine, when adding a column and specifying its partition transforms (via multiple operations within a single transaction) or when just adding partition fields, to ultimately build a PartitionSpec that contains multiple redundant transforms from the same source column. This inconsistency may lead to some problems during distributed reads/writes with engines using Iceberg. For example, in Presto, the coordinator serializes Iceberg table's `Schema` and `PartitionSpec` and sends this information to the workers. The workers then deserialized and reconstruct the `Schema` and `PartitionSpec`. This reconstruction process may now fail due to the validation error, ultimately causing the read or write operation to fail. In my understanding, the validation logic for redundant transforms on the same column should be consistent between constructing a PartitionSpec and updating it. If the Iceberg core does support multiple redundant transforms from a single source column, then the associated validation check should be removed from the PartitionSpec builder to prevent unnecessary fails. Conversely, if the Iceberg core does not support multiple redundant transforms on the same source column, then the `UpdatePartitionSpec` should also do the same validation logic as the PartitionSpec builder. This could reduce the potential issues when computing engines use the relevant Iceberg interfaces. ### Query engine PrestoDB ### Willingness to contribute - [ ] I can contribute this improvement/feature independently - [x] I would be willing to contribute this improvement/feature with guidance from the Iceberg community - [ ] I cannot contribute this improvement/feature at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
