[ 
https://issues.apache.org/jira/browse/ARROW-12732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342342#comment-17342342
 ] 

Joris Van den Bossche commented on ARROW-12732:
-----------------------------------------------

The reason for this is that this {{ArrowPeriodType}} extension type is defined 
by pandas, and only registered as an extension type to Arrow when pandas is 
first imported/used. 

And pyarrow doesn't import pandas by default, but only when using functionality 
that requires pandas (such as the {{to_pandas()}} call).

> read parquet in pyarrow is not idempotent for time period types
> ---------------------------------------------------------------
>
>                 Key: ARROW-12732
>                 URL: https://issues.apache.org/jira/browse/ARROW-12732
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Parquet, Python
>    Affects Versions: 3.0.0, 4.0.0
>            Reporter: Abderrahmane Jaidi
>            Priority: Major
>         Attachments: period.parquet
>
>
> When reading a parquet file (attached) with a period type column via the 
> "read_table" method, it returns "int64" on the first read. After applying 
> "to_pandas" to the pyarrow table, subsequent "read_table" calls of the same 
> parquet file in the same *Python session* return "ArrowPeriodType"
> {code:java}
> import pyarrow
> import pyarrow.parquet
> pq_table = 
> pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
> print(pq_table.schema.types)
> # Out[1]: [DataType(int64)]
> print(pq_table.to_pandas())
> # Out[2]:
> # col
> # 0 2010-01
> pq_table = 
> pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
> print(pq_table.schema.types)
> # Out[3]: [ArrowPeriodType(DataType(int64))]
> pq_table = 
> pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
> print(pq_table.schema.types)
> # Out[4]: [ArrowPeriodType(DataType(int64))]{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to