shardulm94 commented on issue #417: Adding support for time-based partitioning 
on long column type
URL: 
https://github.com/apache/incubator-iceberg/issues/417#issuecomment-527746965
 
 
   @rdblue Let's consider new tables first
   Most of the source data that we ingest has an event time which is 
millisSinceUTCEpoch of long type. It is not feasible to change it at the 
source. One option here I can think of is to transform the source event time 
column to a datatype that Iceberg supports partitioning on in a layer above 
Iceberg and then provide it to Iceberg via a new column. I would like to avoid 
adding another column to the data though, but as a last resort it may be fine.
   
   The idea of deriving the unit from column name seems hacky, agreed. The 
ability to promote long types to timestamp based on some metadata (unit, maybe 
tz too?) sounds good. However, I am not sure how much we can generalize this to 
other data types. Will that become an overfit?
   
   With respect to tables with existing data, the issue is that the original 
data files don't have the partition value as a column. So I was under the 
assumption that identity partitioning cannot be applied here since it depends 
on column data to be present in the file. However looking at Iceberg reader 
code, it seems like for identity partitioning, the column value is derived from 
metadata. Will that assumption hold true generally? If yes, that probably 
solves the issue for existing tables. I couldn't find anything in the spec that 
mentions this though. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to