hantangwangd opened a new issue, #13886:
URL: https://github.com/apache/iceberg/issues/13886

   ### Feature Request / Improvement
   
   Currently, constructing a PartitionSpec with multiple redundant transforms 
on the same column (e.g., day(ts), month(ts)) fails as follows:
   
   ```
   Cannot add redundant partition: 1000: ts_day: day(3) conflicts with 1001: 
ts_month: month(3)
   ```
   
   However, when using `UpdatePartitionSpec` to update a PartitionSpec, this 
validation is somewhat absent. We can successfully add multiple redundant 
transforms for a source column (with the only restriction being that multiple 
cannot be added in a single operation). This allows a computation engine, when 
adding a column and specifying its partition transforms (via multiple 
operations within a single transaction) or when just adding partition fields, 
to ultimately build a PartitionSpec that contains multiple redundant transforms 
from the same source column.
   
   This inconsistency may lead to some problems during distributed reads/writes 
with engines using Iceberg. For example, in Presto, the coordinator serializes 
Iceberg table's `Schema` and `PartitionSpec` and sends this information to the 
workers. The workers then deserialized and reconstruct the `Schema` and 
`PartitionSpec`. This reconstruction process may now fail due to the validation 
error, ultimately causing the read or write operation to fail.
   
   In my understanding, the validation logic for redundant transforms on the 
same column should be consistent between constructing a PartitionSpec and 
updating it. If the Iceberg core does support multiple redundant transforms 
from a single source column, then the associated validation check should be 
removed from the PartitionSpec builder to prevent unnecessary fails. 
Conversely, if the Iceberg core does not support multiple redundant transforms 
on the same source column, then the `UpdatePartitionSpec` should also do the 
same validation logic as the PartitionSpec builder. This could reduce the 
potential issues when computing engines use the relevant Iceberg interfaces.
   
   
   ### Query engine
   
   PrestoDB
   
   ### Willingness to contribute
   
   - [ ] I can contribute this improvement/feature independently
   - [x] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to