Abderrahmane Jaidi created ARROW-12732:
------------------------------------------
Summary: read parquet in pyarrow is not idempotent for time period
types
Key: ARROW-12732
URL: https://issues.apache.org/jira/browse/ARROW-12732
Project: Apache Arrow
Issue Type: Bug
Components: Parquet, Python
Affects Versions: 4.0.0, 3.0.0
Reporter: Abderrahmane Jaidi
Attachments: period.parquet
When reading a parquet file (attached) with a period type column via the
"read_table" method, it returns "int64" on the first read. After applying
"to_pandas" to the pyarrow table, subsequent "read_table" calls of the same
parquet file in the same *Python session* return "ArrowPeriodType"
{code:java}
import pyarrow
import pyarrow.parquet
pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
print(pq_table.schema.types)
# Out[1]: [DataType(int64)]
print(pq_table.to_pandas())
# Out[2]:
# col
# 0 2010-01
pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
print(pq_table.schema.types)
# Out[3]: [ArrowPeriodType(DataType(int64))]
pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
print(pq_table.schema.types)
# Out[4]: [ArrowPeriodType(DataType(int64))]{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)