The following code dies with pyarrow 0.14.2:

import pyarrow as pa
import pyarrow.parquet as pq

schema = pa.schema([('timestamp', pa.timestamp('ns', tz='UTC')),])
writer = pq.ParquetWriter('foo.parquet', schema, coerce_timestamps='ns')

ts_array = pa.array([ int(1234567893141) ], type=pa.timestamp('ns',
tz='UTC'))
table = pa.Table.from_arrays([ ts_array ], names=['timestamp'])

writer.write_table(table)
writer.close()

with the message:

ValueError: Invalid value for coerce_timestamps: ns

That appears to be because of this code in _parquet.pxi:

    cdef int _set_coerce_timestamps(
            self, ArrowWriterProperties.Builder* props) except -1:
        if self.coerce_timestamps == 'ms':
            props.coerce_timestamps(TimeUnit_MILLI)
        elif self.coerce_timestamps == 'us':
            props.coerce_timestamps(TimeUnit_MICRO)
        elif self.coerce_timestamps is not None:
            raise ValueError('Invalid value for coerce_timestamps: {0}'
                             .format(self.coerce_timestamps))

which restricts the choice to 'ms' or 'us', even though AFAICT everywhere
else also allows 'ns' (and there is a TimeUnit_NANO defined). Is this
intentional, or a bug?

Thanks,

 - db

Reply via email to