Yiannis Liodakis created ARROW-2020:
---------------------------------------

             Summary: pyarrow: Parquet segfaults if coercing ns timestamps and 
writing 96-bit timestamps
                 Key: ARROW-2020
                 URL: https://issues.apache.org/jira/browse/ARROW-2020
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.8.0
         Environment: OS: Mac OS X 10.13.2
Python: 3.6.4
PyArrow: 0.8.0
            Reporter: Yiannis Liodakis
         Attachments: crash-report.txt

If you try to write a PyArrow table containing nanosecond-resolution timestamps 
to Parquet using `coerce_timestamps` and 
`use_deprecated_int96_timestamps=True`, the Arrow library will segfault.

The crash doesn't happen if you don't coerce the timestamp resolution or if you 
don't use 96-bit timestamps.

 

 

*To Reproduce:*

 
{code:java}
 
import datetime

import pyarrow
from pyarrow import parquet

schema = pyarrow.schema([
    pyarrow.field('last_updated', pyarrow.timestamp('ns')),
])

data = [
    pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('ns')),
]

table = pyarrow.Table.from_arrays(data, ['last_updated'])

with open('test_file.parquet', 'wb') as fdesc:
    parquet.write_table(table, fdesc,
                        coerce_timestamps='us',  # 'ms' works too
                        use_deprecated_int96_timestamps=True){code}
 

See attached file for the crash report.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to