[
https://issues.apache.org/jira/browse/ARROW-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881578#comment-16881578
]
John Wilson commented on ARROW-5895:
------------------------------------
OK, so the problem is with S3.
I pull data from a postgres DB and upload it to S3 as a parquet file.
When I do an S3 SELECT on the parquet file, i get back an ISO string using
0.13.0:
[
{
"id": 516,
"ts": "2019-04-19T00:09:11.226Z",
...
}
]
When I upgrade to 0.14, S3 interprets the field as an int epoch
The code is exactly the same, but the pyarrow version has been upgraded:
[
{
"id": 2383028,
"ts": 1561939200507,
...
}
]
Here is a snippet of my write:
tbl = pyarrow.Table.from_pandas(df=df)
{{with tempfile.TemporaryDirectory() as local_path:}}
{{ pyarrow.parquet.write_to_dataset(table=tbl, root_path=local_path,}}
{{ partition_cols=['env', 'dt'],}}
{{ coerce_timestamps='ms',}}
{{ allow_truncated_timestamps=True,}}
{{ version='2.0',}}
{{ compression='SNAPPY')}}
> [Python] New version stores timestamps as epoch ms instead of ISO timestamp
> string
> ----------------------------------------------------------------------------------
>
> Key: ARROW-5895
> URL: https://issues.apache.org/jira/browse/ARROW-5895
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.14.0
> Environment: Linux dev.office.whoop.com 3.10.0-957.21.3.el7.x86_64 #1
> SMP Tue Jun 18 16:35:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
> Reporter: John Wilson
> Priority: Major
>
> Just upgraded from pyarrow 0.13 to 0.14.
> Columns of type TimestampType(timestmap[ns]) now get written as epoch ms
> values:
> 1561939200507
> Where 0.13 wrote TimestampType(timestamp[ns]) as an ISO string:
> 2019-07-01T00:00:00.507Z
> This broke my implementation. How do I get pyarrow to write ISO strings
> again in 0.14?
>
> Here is my table write:
> {{ pyarrow.parquet.write_to_dataset(table=tbl, root_path=local_path,}}
> {{ partition_cols=['env', 'dt'],}}
> {{ coerce_timestamps='ms',}}
> {{ allow_truncated_timestamps=True,}}
> {{ version='2.0',}}
> {{ compression='SNAPPY')}}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)