boroknagyz opened a new issue #2043:
URL: https://github.com/apache/iceberg/issues/2043


   PartitionSpec.partitionToPath() method creates a human-readable partition 
path:
   
https://github.com/apache/iceberg/blob/083e70bd0e9be39c25170aee2ddb7e084527fed6/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L173
   
   However, DataFiles.fillFromPath() invokes Conversions.fromPartitionString()
   
https://github.com/apache/iceberg/blob/083e70bd0e9be39c25170aee2ddb7e084527fed6/core/src/main/java/org/apache/iceberg/DataFiles.java#L84
   
   Which interprets the values via a simple string parsing:
   
https://github.com/apache/iceberg/blob/083e70bd0e9be39c25170aee2ddb7e084527fed6/api/src/main/java/org/apache/iceberg/types/Conversions.java#L44-L78
   
   This causes problems with partition transforms. E.g., partition transform 
YEAR stores the passed years since 1970: 
https://iceberg.apache.org/spec/#partition-transforms
   
   Therefore if we store the timestamp '2021-01-07 11:58:19.523065', and our 
table is using the YEAR partition transform, then the partition path will be 
ts_year=2021 (human-readable). Now, if we set the partition data using 
Datafiles.fillFromPath() (or, with Builder.withPartitionPath()), partition data 
will have the value of 2021, and not the passed years since 1970.
   
   I think besides toHumanString(), partition transforms should have a 
fromHumanString() method as well, and this method should be used when parsing 
the partition path.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to