Abderrahmane Jaidi created ARROW-12732:
------------------------------------------

             Summary: read parquet in pyarrow is not idempotent for time period 
types
                 Key: ARROW-12732
                 URL: https://issues.apache.org/jira/browse/ARROW-12732
             Project: Apache Arrow
          Issue Type: Bug
          Components: Parquet, Python
    Affects Versions: 4.0.0, 3.0.0
            Reporter: Abderrahmane Jaidi
         Attachments: period.parquet

When reading a parquet file (attached) with a period type column via the 
"read_table" method, it returns "int64" on the first read. After applying 
"to_pandas" to the pyarrow table, subsequent "read_table" calls of the same 
parquet file in the same *Python session* return "ArrowPeriodType"
{code:java}
import pyarrow
import pyarrow.parquet


pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
print(pq_table.schema.types)
# Out[1]: [DataType(int64)]

print(pq_table.to_pandas())
# Out[2]:
# col
# 0 2010-01

pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
print(pq_table.schema.types)
# Out[3]: [ArrowPeriodType(DataType(int64))]

pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
print(pq_table.schema.types)
# Out[4]: [ArrowPeriodType(DataType(int64))]{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to