If we were to use two columns (timestamp and nanos as a long) , how would partitioning and sorting work? I imagine we’d just partition on the timestamp column but sort on the timestamp and nanos columns?
From: Ryan Blue <b...@apache.org> Sent: Friday, May 21, 2021 6:09 PM To: dev@iceberg.apache.org Subject: Re: Non-microsecond timestamps Hi Tina, For millisecond timestamps, I'd recommend just converting them to microseconds and using the timestamp type. That at least isn't too bad. For nanosecond timestamps, that's a more difficult problem. We originally didn't choose to add this because it adds complication to the spec, but we may add it in future versions. Right now, you may want to consider storing the nanos in a separate field. For example, you could store two columns, `ts_nanos long` and `ts_micros timestamp`. For filtering and most operations, you'd use the timestamp field but for other operations you could use the raw nanoseconds field. That would make it a bit harder to work with the table, but you'd be able to get the precision you need. I hope that helps. If you'd like to help us think through adding a nanosecond timestamp type, we can consider doing that also. Ryan On Fri, May 21, 2021 at 10:25 AM Tina Luo <tina....@twosigma.com<mailto:tina....@twosigma.com>> wrote: Hi, Are there recommendations for or work on using non-microsecond timestamps with Iceberg? For our use case, we’d want to read, write, and partition with non-microsecond timestamps (specifically, millisecond and nanosecond). For milliseconds, we could convert to and from microseconds outside of Iceberg although this is not ideal. But this doesn’t work for nanoseconds without losing precision. In a similar issue<https://github.com/trinodb/trino/issues/1284>, there were suggestions for Iceberg to support custom transforms for timestamps. Is that in the plan? Thanks, Tina -- Ryan Blue