Joris, Thank you for the response. There's such a trail of stale information online w/r to the overall that it wasn't clear what the status was. For example, simple searches take you into the "INT96 is deprecated therefore suppport for nanoseconds is as well" cul-de-sac. Absence that confusing context, the existing error message is fine.
It's worth noting that accurate and precise timestamps down to ~0.1 nanosecond are widely available, with 0.02ns being available for just a few thousand $US. I'll stick with usec resolution for absolute time and just use an int64 field for my nanosecond data. Thanks again. - db On Thu, Oct 10, 2019 at 5:11 AM Joris Van den Bossche < jorisvandenboss...@gmail.com> wrote: > Hi David, > > This is intentional, see > https://arrow.apache.org/docs/python/parquet.html#storing-timestamps for > some explanation in the documentation. Basicly, the parquet format only > supports ms and us resolution, and so nanosecond timestamps (which are > supported by Arrow) are converted to one of those resolutions. > > We could maybe clarify that better in the error message (something like > "only 'ms' and 'us' are supported") ? > > In the latest version of the parquet format specification, there is > actually support for nanosecond resolution as well ( > > https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#deprecated-timestamp-convertedtype > ). > You can obtain this by specifying version="2.0", but the implementation is > not yet fully ready (see https://issues.apache.org/jira/browse/PARQUET-458 > ), > and also not all frameworks support this version (so if compatibility > across processing frameworks is important, it is recommended to stick with > version 1). > > Joris > > On Wed, 9 Oct 2019 at 21:27, David Boles <bibliobo...@gmail.com> wrote: > > > The following code dies with pyarrow 0.14.2: > > > > import pyarrow as pa > > import pyarrow.parquet as pq > > > > schema = pa.schema([('timestamp', pa.timestamp('ns', tz='UTC')),]) > > writer = pq.ParquetWriter('foo.parquet', schema, coerce_timestamps='ns') > > > > ts_array = pa.array([ int(1234567893141) ], type=pa.timestamp('ns', > > tz='UTC')) > > table = pa.Table.from_arrays([ ts_array ], names=['timestamp']) > > > > writer.write_table(table) > > writer.close() > > > > with the message: > > > > ValueError: Invalid value for coerce_timestamps: ns > > > > That appears to be because of this code in _parquet.pxi: > > > > cdef int _set_coerce_timestamps( > > self, ArrowWriterProperties.Builder* props) except -1: > > if self.coerce_timestamps == 'ms': > > props.coerce_timestamps(TimeUnit_MILLI) > > elif self.coerce_timestamps == 'us': > > props.coerce_timestamps(TimeUnit_MICRO) > > elif self.coerce_timestamps is not None: > > raise ValueError('Invalid value for coerce_timestamps: {0}' > > .format(self.coerce_timestamps)) > > > > which restricts the choice to 'ms' or 'us', even though AFAICT everywhere > > else also allows 'ns' (and there is a TimeUnit_NANO defined). Is this > > intentional, or a bug? > > > > Thanks, > > > > - db > > >