[ 
https://issues.apache.org/jira/browse/ARROW-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881585#comment-16881585
 ] 

Joris Van den Bossche edited comment on ARROW-5895 at 7/9/19 10:07 PM:
-----------------------------------------------------------------------

So what changed in 0.14.0 compared to 0.13 is that timestamp columns are now 
also annotated with the new LogicalType (eg {{TIMESTAMP(unit=MICROS)}}) in 
addition to the older ConvertedType ({{TIMESTAMP_MILLIS/MICROS}}. However, 
there are some compatibility problems where the older ConvertedType is omitted 
for tz-naive data (see ARROW-5878). 

Could you try with timezone aware data to check if you are encountering the 
same issue? Because it might be that the S3 parquet reader does not yet 
understand the new LogicalTypes, and thus the absence of the ConvertedType 
annotation could lead to interpreting it as just integers (as you see in the 
output)

I don't think there is an option to *not* write those new LogicalTypes, but the 
omission of the ConvertedType annotation is a bug that should be fixed for 
0.14.1.



was (Author: jorisvandenbossche):
So what changed in 0.14.0 compared to 0.13 is that timestamp columns are now 
also annotated with the new LogicalType (eg {{TIMESTAMP(unit=MICROS)}}) in 
addition to the older ConvertedType ({{TIMESTAMP_MILLIS/MICROS}}. However, 
there are some compatibility problems where the older ConvertedType is omitted 
for tz-naive data (see ARROW-5889). 

Could you try with timezone aware data to check if you are encountering the 
same issue? Because it might be that the S3 parquet reader does not yet 
understand the new LogicalTypes, and thus the absence of the ConvertedType 
annotation could lead to interpreting it as just integers (as you see in the 
output)

I don't think there is an option to *not* write those new LogicalTypes, but the 
omission of the ConvertedType annotation is a bug that should be fixed for 
0.14.1.


> [Python] New version stores timestamps as epoch ms instead of ISO timestamp 
> string
> ----------------------------------------------------------------------------------
>
>                 Key: ARROW-5895
>                 URL: https://issues.apache.org/jira/browse/ARROW-5895
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.14.0
>         Environment: Linux dev.office.whoop.com 3.10.0-957.21.3.el7.x86_64 #1 
> SMP Tue Jun 18 16:35:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: John Wilson
>            Priority: Major
>
> Just upgraded from pyarrow 0.13 to 0.14.
> Columns of type TimestampType(timestmap[ns]) now get written as epoch ms 
> values: 
> 1561939200507
> Where 0.13 wrote TimestampType(timestamp[ns]) as an ISO string:
> 2019-07-01T00:00:00.507Z
> This broke my implementation.  How do I get pyarrow to write ISO strings 
> again in 0.14?
>  
> Here is my table write:
> {{ pyarrow.parquet.write_to_dataset(table=tbl, root_path=local_path,}}
> {{ partition_cols=['env', 'dt'],}}
> {{ coerce_timestamps='ms',}}
> {{ allow_truncated_timestamps=True,}}
> {{ version='2.0',}}
> {{ compression='SNAPPY')}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to