Yiannis Liodakis created ARROW-2020: ---------------------------------------
Summary: pyarrow: Parquet segfaults if coercing ns timestamps and writing 96-bit timestamps Key: ARROW-2020 URL: https://issues.apache.org/jira/browse/ARROW-2020 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.8.0 Environment: OS: Mac OS X 10.13.2 Python: 3.6.4 PyArrow: 0.8.0 Reporter: Yiannis Liodakis Attachments: crash-report.txt If you try to write a PyArrow table containing nanosecond-resolution timestamps to Parquet using `coerce_timestamps` and `use_deprecated_int96_timestamps=True`, the Arrow library will segfault. The crash doesn't happen if you don't coerce the timestamp resolution or if you don't use 96-bit timestamps. *To Reproduce:* {code:java} import datetime import pyarrow from pyarrow import parquet schema = pyarrow.schema([ pyarrow.field('last_updated', pyarrow.timestamp('ns')), ]) data = [ pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('ns')), ] table = pyarrow.Table.from_arrays(data, ['last_updated']) with open('test_file.parquet', 'wb') as fdesc: parquet.write_table(table, fdesc, coerce_timestamps='us', # 'ms' works too use_deprecated_int96_timestamps=True){code} See attached file for the crash report. -- This message was sent by Atlassian JIRA (v7.6.3#76005)