[ 
https://issues.apache.org/jira/browse/ARROW-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881578#comment-16881578
 ] 

John Wilson commented on ARROW-5895:
------------------------------------

OK, so the problem is with S3.

I pull data from a postgres DB and upload it to S3 as a parquet file.

When I do an S3 SELECT on the parquet file, i get back an ISO string using 
0.13.0:
[
    {
        "id": 516,
        "ts": "2019-04-19T00:09:11.226Z",
        ...
    }
]
When I upgrade to 0.14, S3 interprets the field as an int epoch

The code is exactly the same, but the pyarrow version has been upgraded:
[
    {
        "id": 2383028,
        "ts": 1561939200507,
       ...
    }
]
 

Here is a snippet of my write:

tbl = pyarrow.Table.from_pandas(df=df)

{{with tempfile.TemporaryDirectory() as local_path:}}
{{  pyarrow.parquet.write_to_dataset(table=tbl, root_path=local_path,}}
{{                                   partition_cols=['env', 'dt'],}}
{{                                   coerce_timestamps='ms',}}
{{                                   allow_truncated_timestamps=True,}}
{{                                   version='2.0',}}
{{                                   compression='SNAPPY')}}

> [Python] New version stores timestamps as epoch ms instead of ISO timestamp 
> string
> ----------------------------------------------------------------------------------
>
>                 Key: ARROW-5895
>                 URL: https://issues.apache.org/jira/browse/ARROW-5895
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.14.0
>         Environment: Linux dev.office.whoop.com 3.10.0-957.21.3.el7.x86_64 #1 
> SMP Tue Jun 18 16:35:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: John Wilson
>            Priority: Major
>
> Just upgraded from pyarrow 0.13 to 0.14.
> Columns of type TimestampType(timestmap[ns]) now get written as epoch ms 
> values: 
> 1561939200507
> Where 0.13 wrote TimestampType(timestamp[ns]) as an ISO string:
> 2019-07-01T00:00:00.507Z
> This broke my implementation.  How do I get pyarrow to write ISO strings 
> again in 0.14?
>  
> Here is my table write:
> {{ pyarrow.parquet.write_to_dataset(table=tbl, root_path=local_path,}}
> {{ partition_cols=['env', 'dt'],}}
> {{ coerce_timestamps='ms',}}
> {{ allow_truncated_timestamps=True,}}
> {{ version='2.0',}}
> {{ compression='SNAPPY')}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to