Yevgeni Litvin created ARROW-5260:
-------------------------------------

             Summary: [Python][C++] Crash when deserializing from components in 
a fresh new process
                 Key: ARROW-5260
                 URL: https://issues.apache.org/jira/browse/ARROW-5260
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
    Affects Versions: 0.12.1, 0.13.0, 0.12.0
            Reporter: Yevgeni Litvin


Trying to deserialize a table from component in a fresh new process crashes 
with sigsegv:
{noformat}
#1 0x00007fffd5eb93f0 in arrow::py::unwrap_buffer(_object*, 
std::shared_ptr<arrow::Buffer>*) ()
from 
/home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./libarrow_python.so.13
#2 0x00007fffd5e69260 in arrow::py::GetSerializedFromComponents(int, int, int, 
_object*, arrow::py::SerializedPyObject*) () from 
/home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./libarrow_python.so.13
#3 0x00007fffd6b1cafe in 
__pyx_pw_7pyarrow_3lib_18SerializedPyObject_7from_components(_object*, 
_object*, _object*) () from 
/home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
#4 0x00000000004ad919 in PyCFunction_Call ()
#5 0x00007fffd6a88d10 in __Pyx_PyObject_Call(_object*, _object*, _object*) 
[clone .constprop.1186] ()
from 
/home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
#6 0x00007fffd6a41872 in __Pyx__PyObject_CallOneArg(_object*, _object*) ()
from 
/home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
#7 0x00007fffd6a89e59 in __Pyx_PyObject_CallOneArg(_object*, _object*) ()
from 
/home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
#8 0x00007fffd6ab087f in 
__pyx_pw_7pyarrow_3lib_165deserialize_components(_object*, _object*, _object*) 
()
from 
/home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
#9 0x00000000004adca7 in _PyCFunction_FastCallKeywords ()
#10 0x0000000000545e34 in ?? ()
#11 0x000000000054ac8c in _PyEval_EvalFrameDefault ()
#12 0x0000000000545a51 in ?? ()
#13 0x0000000000546890 in PyEval_EvalCode ()
#14 0x000000000042a9a8 in PyRun_FileExFlags ()
#15 0x000000000042ab8d in PyRun_SimpleFileExFlags ()
#16 0x000000000043e0ba in Py_Main ()
#17 0x0000000000421b04 in main ()
{noformat}
 The following snippet can be used to reproduce the issue:
{code:java}
import pickle
import sys

import pandas as pd
import pyarrow as pa

if __name__ == '__main__':
    if sys.argv[1] == 'w':
        df = pd.DataFrame({'int': [1, 2], 'str': ['a', 'b']})
        table = pa.Table.from_pandas(df)
        table_serialized = pa.serialize(table)
        table_serialized_components = table_serialized.to_components()
        with open('/tmp/p.pickle', 'wb') as f:
            pickle.dump(table_serialized_components, f)
        print('/tmp/p.pickle written ok')

    if sys.argv[1] == 'r':
        # UNCOMMENT THE FOLLOWING LINE TO AVOID THE CRASH
        #pa.serialize(0)
        with open('/tmp/p.pickle', 'rb') as f:
            table_serialized_components = pickle.load(f)
        table = pa.deserialize_components(table_serialized_components)
        print(table)

{code}
Then run:
{code:java}
$ python pa_serialization_crashes.py w
/tmp/p.pickle written ok

$ python pa_serialization_crashes.py r
Segmentation fault (core dumped){code}
The crash would not occur if you try to serialize unrelated data before the 
deserialization (see a commented out line in the reproduction instructions)

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to