If we were to use two columns (timestamp and nanos as a long) , how would 
partitioning and sorting work? I imagine we’d just partition on the timestamp 
column but sort on the timestamp and nanos columns?

From: Ryan Blue <b...@apache.org>
Sent: Friday, May 21, 2021 6:09 PM
To: dev@iceberg.apache.org
Subject: Re: Non-microsecond timestamps

Hi Tina,

For millisecond timestamps, I'd recommend just converting them to microseconds 
and using the timestamp type. That at least isn't too bad.

For nanosecond timestamps, that's a more difficult problem. We originally 
didn't choose to add this because it adds complication to the spec, but we may 
add it in future versions. Right now, you may want to consider storing the 
nanos in a separate field. For example, you could store two columns, `ts_nanos 
long` and `ts_micros timestamp`. For filtering and most operations, you'd use 
the timestamp field but for other operations you could use the raw nanoseconds 
field. That would make it a bit harder to work with the table, but you'd be 
able to get the precision you need.

I hope that helps. If you'd like to help us think through adding a nanosecond 
timestamp type, we can consider doing that also.

Ryan

On Fri, May 21, 2021 at 10:25 AM Tina Luo 
<tina....@twosigma.com<mailto:tina....@twosigma.com>> wrote:
Hi,

Are there recommendations for or work on using non-microsecond timestamps with 
Iceberg? For our use case, we’d want to read, write, and partition with 
non-microsecond timestamps (specifically, millisecond and nanosecond).

For milliseconds, we could convert to and from microseconds outside of Iceberg 
although this is not ideal. But this doesn’t work for nanoseconds without 
losing precision.

In a similar issue<https://github.com/trinodb/trino/issues/1284>, there were 
suggestions for Iceberg to support custom transforms for timestamps. Is that in 
the plan?

Thanks,
Tina


--
Ryan Blue

Reply via email to