[ 
https://issues.apache.org/jira/browse/ARROW-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882452#comment-16882452
 ] 

TP Boudreau edited comment on ARROW-5889 at 7/10/19 10:13 PM:
--------------------------------------------------------------

[~wesmckinn]: I think you're right, but that's a tougher case.  Here, taking 
the position that TIMESTAMP converted type is ambiguous, we can abstain from 
implementing the spec.  There, taking the consistent position, we have to 
violate the spec.  Still, I would support doing it for a regression patch fix.  
We might want to explore the possibility of allowing the user (through a new 
file property perhaps) to choose whether they expect strict adherence to the 
spec re TIMESTAMPs.

 

[~jorisvandenbossche]: see for example 
[https://github.com/G-Research/ParquetSharp] , wrapping parquet-cpp in C# like 
PyArrow for Python.  But there are probably lots of others.


was (Author: tpboudreau):
[~wesmckinn]: I think you're right, but that's a tougher case.  Here, taking 
the position that TIMESTAMP converted type is ambiguous, we can abstain from 
implementing the spec.  There, taking the consistent position, we have to 
violate the spec.  Still, I would support doing it for a regression patch fix.  
We might want to explore the possibility of allowing the user (through a new 
file property perhaps) to choose whether they expect strict adherence to the 
spec re TIMESTAMPs. (Probably another can of worms I'll later regret 
suggesting.)

 

[~jorisvandenbossche]: see for example 
[https://github.com/G-Research/ParquetSharp] , wrapping parquet-cpp in C# like 
PyArrow for Python.  But there are probably lots of others.

> [Python][C++] Parquet backwards compat for timestamps without timezone broken
> -----------------------------------------------------------------------------
>
>                 Key: ARROW-5889
>                 URL: https://issues.apache.org/jira/browse/ARROW-5889
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 0.14.0
>            Reporter: Florian Jetter
>            Assignee: TP Boudreau
>            Priority: Minor
>              Labels: parquet
>             Fix For: 0.14.1
>
>         Attachments: 0.12.1.parquet, 0.13.0.parquet
>
>
> When reading a parquet file which has timestamp fields they are read as a 
> timestamp with timezone UTC if the parquet file was written by pyarrow 0.13.0 
> and/or 0.12.1.
> Expected behavior would be that they are loaded as timestamps without any 
> timezone information.
> The attached files contain one row for all basic types and a few nested 
> types, the timestamp fields are called datetime64 and datetime64_tz
> see also 
> [https://github.com/JDASoftwareGroup/kartothek/tree/master/reference-data/arrow-compat]
> [https://github.com/JDASoftwareGroup/kartothek/blob/c47e52116e2dc726a74d7d6b97922a0252722ed0/tests/serialization/test_arrow_compat.py#L31]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to