Jordan Samuels created ARROW-1957:
-------------------------------------
Summary: Handle nanosecond timestamps in parquet serialization
Key: ARROW-1957
URL: https://issues.apache.org/jira/browse/ARROW-1957
Project: Apache Arrow
Issue Type: Improvement
Affects Versions: 0.8.0
Environment: Python 3.6.4, Mac OSX
Reporter: Jordan Samuels
Priority: Minor
The following code
{code:python}
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
n=3
df = pd.DataFrame({'x': range(n)}, index=pd.DatetimeIndex(start='2017-01-01',
freq='1n', periods=n))
pq.write_table(pa.Table.from_pandas(df), '/tmp/t.parquet'){code}
results in:
{{ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data:
1483228800000000001}}
The desired effect is that we can save nanosecond resolution without losing
precision (e.g. conversion to ms). Note that if {{freq='1u'}} is used, the
code runs properly.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)