boroknagyz opened a new issue #2043: URL: https://github.com/apache/iceberg/issues/2043
PartitionSpec.partitionToPath() method creates a human-readable partition path: https://github.com/apache/iceberg/blob/083e70bd0e9be39c25170aee2ddb7e084527fed6/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L173 However, DataFiles.fillFromPath() invokes Conversions.fromPartitionString() https://github.com/apache/iceberg/blob/083e70bd0e9be39c25170aee2ddb7e084527fed6/core/src/main/java/org/apache/iceberg/DataFiles.java#L84 Which interprets the values via a simple string parsing: https://github.com/apache/iceberg/blob/083e70bd0e9be39c25170aee2ddb7e084527fed6/api/src/main/java/org/apache/iceberg/types/Conversions.java#L44-L78 This causes problems with partition transforms. E.g., partition transform YEAR stores the passed years since 1970: https://iceberg.apache.org/spec/#partition-transforms Therefore if we store the timestamp '2021-01-07 11:58:19.523065', and our table is using the YEAR partition transform, then the partition path will be ts_year=2021 (human-readable). Now, if we set the partition data using Datafiles.fillFromPath() (or, with Builder.withPartitionPath()), partition data will have the value of 2021, and not the passed years since 1970. I think besides toHumanString(), partition transforms should have a fromHumanString() method as well, and this method should be used when parsing the partition path. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
