shardulm94 commented on issue #417: Adding support for time-based partitioning on long column type URL: https://github.com/apache/incubator-iceberg/issues/417#issuecomment-527746965 @rdblue Let's consider new tables first Most of the source data that we ingest has an event time which is millisSinceUTCEpoch of long type. It is not feasible to change it at the source. One option here I can think of is to transform the source event time column to a datatype that Iceberg supports partitioning on in a layer above Iceberg and then provide it to Iceberg via a new column. I would like to avoid adding another column to the data though, but as a last resort it may be fine. The idea of deriving the unit from column name seems hacky, agreed. The ability to promote long types to timestamp based on some metadata (unit, maybe tz too?) sounds good. However, I am not sure how much we can generalize this to other data types. Will that become an overfit? With respect to tables with existing data, the issue is that the original data files don't have the partition value as a column. So I was under the assumption that identity partitioning cannot be applied here since it depends on column data to be present in the file. However looking at Iceberg reader code, it seems like for identity partitioning, the column value is derived from metadata. Will that assumption hold true generally? If yes, that probably solves the issue for existing tables. I couldn't find anything in the spec that mentions this though.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
