[
https://issues.apache.org/jira/browse/ARROW-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439852#comment-16439852
]
ASF GitHub Bot commented on ARROW-2082:
---------------------------------------
joshuastorck opened a new pull request #456: ARROW-2082: Prevent segfault that
was occurring when writing a nanosecond timestamp with arrow writer properties
set to coerce timestamps and support deprecated int96 timestamps.
URL: https://github.com/apache/parquet-cpp/pull/456
The bug was a due to the fact that the physical type was int64 but the
WriteTimestamps function was taking a path that assumed the physical type was
int96. This caused memory corruption because it was writing past the end of the
array. The bug was fixed by checking that coerce timestamps is disabled when
writing int96.
A unit test was added for the regression.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [Python] SegFault in pyarrow.parquet.write_table with specific options
> ----------------------------------------------------------------------
>
> Key: ARROW-2082
> URL: https://issues.apache.org/jira/browse/ARROW-2082
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.8.0
> Environment: tested on MacOS High Sierra with python 3.6 and Ubuntu
> Xenial (Python 3.5)
> Reporter: Clément Bouscasse
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I originally filed an issue in the pandas project but we've tracked it down
> to arrow itself, when called via pandas in specific circumstances:
> [https://github.com/pandas-dev/pandas/issues/19493]
> basically using
> {code:java}
> df.to_parquet('filename.parquet', flavor='spark'){code}
> gives a seg fault if `df` contains a datetime column.
> Under the covers, pandas translates this to the following call:
> {code:java}
> pq.write_table(table, 'output.parquet', flavor='spark', compression='snappy',
> coerce_timestamps='ms')
> {code}
> which gives me an instant crash.
> There is a repro on the github ticket.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)