[
https://issues.apache.org/jira/browse/ARROW-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882452#comment-16882452
]
TP Boudreau edited comment on ARROW-5889 at 7/10/19 10:13 PM:
--------------------------------------------------------------
[~wesmckinn]: I think you're right, but that's a tougher case. Here, taking
the position that TIMESTAMP converted type is ambiguous, we can abstain from
implementing the spec. There, taking the consistent position, we have to
violate the spec. Still, I would support doing it for a regression patch fix.
We might want to explore the possibility of allowing the user (through a new
file property perhaps) to choose whether they expect strict adherence to the
spec re TIMESTAMPs.
[~jorisvandenbossche]: see for example
[https://github.com/G-Research/ParquetSharp] , wrapping parquet-cpp in C# like
PyArrow for Python. But there are probably lots of others.
was (Author: tpboudreau):
[~wesmckinn]: I think you're right, but that's a tougher case. Here, taking
the position that TIMESTAMP converted type is ambiguous, we can abstain from
implementing the spec. There, taking the consistent position, we have to
violate the spec. Still, I would support doing it for a regression patch fix.
We might want to explore the possibility of allowing the user (through a new
file property perhaps) to choose whether they expect strict adherence to the
spec re TIMESTAMPs. (Probably another can of worms I'll later regret
suggesting.)
[~jorisvandenbossche]: see for example
[https://github.com/G-Research/ParquetSharp] , wrapping parquet-cpp in C# like
PyArrow for Python. But there are probably lots of others.
> [Python][C++] Parquet backwards compat for timestamps without timezone broken
> -----------------------------------------------------------------------------
>
> Key: ARROW-5889
> URL: https://issues.apache.org/jira/browse/ARROW-5889
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 0.14.0
> Reporter: Florian Jetter
> Assignee: TP Boudreau
> Priority: Minor
> Labels: parquet
> Fix For: 0.14.1
>
> Attachments: 0.12.1.parquet, 0.13.0.parquet
>
>
> When reading a parquet file which has timestamp fields they are read as a
> timestamp with timezone UTC if the parquet file was written by pyarrow 0.13.0
> and/or 0.12.1.
> Expected behavior would be that they are loaded as timestamps without any
> timezone information.
> The attached files contain one row for all basic types and a few nested
> types, the timestamp fields are called datetime64 and datetime64_tz
> see also
> [https://github.com/JDASoftwareGroup/kartothek/tree/master/reference-data/arrow-compat]
> [https://github.com/JDASoftwareGroup/kartothek/blob/c47e52116e2dc726a74d7d6b97922a0252722ed0/tests/serialization/test_arrow_compat.py#L31]
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)