David Li created ARROW-10523:
--------------------------------
Summary: [Python] Pandas timestamps are inferred to have only
microsecond precision
Key: ARROW-10523
URL: https://issues.apache.org/jira/browse/ARROW-10523
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Affects Versions: 2.0.0
Reporter: David Li
{code:java}
import pyarrow as pa
import pandas as pd
arr = pa.array([pd.Timestamp(year=2020, month=1, day=1, nanosecond=999)])
print(arr)
print(arr.type) {code}
This gives:
{noformat}
[
2020-01-01 00:00:00.000000
]
timestamp[us]
{noformat}
However, Pandas Timestamps have nanosecond precision, which would be nice to
preserve in inference.
The reason is that TypeInferrer [hardcodes
microseconds|https://github.com/apache/arrow/blob/apache-arrow-2.0.0/cpp/src/arrow/python/inference.cc#L466]
as it only knows about the standard library datetime, so I'm treating this as
a feature request and not quite a bug. Of course, this can be worked around
easily by specifying an explicit type.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)