rdblue commented on issue #417: Adding support for time-based partitioning on 
long column type
URL: 
https://github.com/apache/incubator-iceberg/issues/417#issuecomment-526384694
 
 
   @shardulm94, do you intend to use this for tables with existing data?
   
   If you do intend to use this with existing tables, then I'm not sure that 
you want to use the time-based hidden partitioning transforms. The problem is 
that changing the partitioning for existing tables that use identity 
partitioning is that your queries may start fail because you're no longer 
producing the old partition columns (e.g., `ts_date=cast(cast(ts as date) as 
string)`. And if you are producing the old partition columns, then there's not 
much of a point to add extra time-based partitioning (splits will also be 
pruned using time ranges from min/max metadata).
   
   If you don't intend to use existing data, then do normal timestamps work?
   
   I guess there's another case, where you want to rebuild the table metadata, 
but use old data files. In that case, is there anything to distinguish the data 
in these columns from timestamps with a different format, like long values that 
store microseconds from epoch?
   
   The problem is correctness when other people start using this. If Iceberg 
supports interpreting a long column as an instant, then it must be obvious what 
the unit of the long type is. Maybe we could allow this if the column name 
includes some clue, like `timestamp_millis` vs `timestamp_micros`, but that 
sounds hacky to me.
   
   Another solution is to add a way to promote from long to timestamp type and 
store the units of the long in metadata somewhere. Then you would be able to 
use old data as real timestamp columns.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to