[
https://issues.apache.org/jira/browse/ARROW-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rok Mihevc updated ARROW-5260:
------------------------------
External issue URL: https://github.com/apache/arrow/issues/21731
> [Python][C++] Crash when deserializing from components in a fresh new process
> -----------------------------------------------------------------------------
>
> Key: ARROW-5260
> URL: https://issues.apache.org/jira/browse/ARROW-5260
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 0.12.0, 0.12.1, 0.13.0
> Reporter: Yevgeni Litvin
> Assignee: Antoine Pitrou
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.14.0
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Trying to deserialize a table from component in a fresh new process crashes
> with sigsegv:
> {noformat}
> #1 0x00007fffd5eb93f0 in arrow::py::unwrap_buffer(_object*,
> std::shared_ptr<arrow::Buffer>*) ()
> from
> /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./libarrow_python.so.13
> #2 0x00007fffd5e69260 in arrow::py::GetSerializedFromComponents(int, int,
> int, _object*, arrow::py::SerializedPyObject*) () from
> /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./libarrow_python.so.13
> #3 0x00007fffd6b1cafe in
> __pyx_pw_7pyarrow_3lib_18SerializedPyObject_7from_components(_object*,
> _object*, _object*) () from
> /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
> #4 0x00000000004ad919 in PyCFunction_Call ()
> #5 0x00007fffd6a88d10 in __Pyx_PyObject_Call(_object*, _object*, _object*)
> [clone .constprop.1186] ()
> from
> /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
> #6 0x00007fffd6a41872 in __Pyx__PyObject_CallOneArg(_object*, _object*) ()
> from
> /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
> #7 0x00007fffd6a89e59 in __Pyx_PyObject_CallOneArg(_object*, _object*) ()
> from
> /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
> #8 0x00007fffd6ab087f in
> __pyx_pw_7pyarrow_3lib_165deserialize_components(_object*, _object*,
> _object*) ()
> from
> /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
> #9 0x00000000004adca7 in _PyCFunction_FastCallKeywords ()
> #10 0x0000000000545e34 in ?? ()
> #11 0x000000000054ac8c in _PyEval_EvalFrameDefault ()
> #12 0x0000000000545a51 in ?? ()
> #13 0x0000000000546890 in PyEval_EvalCode ()
> #14 0x000000000042a9a8 in PyRun_FileExFlags ()
> #15 0x000000000042ab8d in PyRun_SimpleFileExFlags ()
> #16 0x000000000043e0ba in Py_Main ()
> #17 0x0000000000421b04 in main ()
> {noformat}
> The following snippet can be used to reproduce the issue:
> {code:java}
> import pickle
> import sys
> import pandas as pd
> import pyarrow as pa
> if __name__ == '__main__':
> if sys.argv[1] == 'w':
> df = pd.DataFrame({'int': [1, 2], 'str': ['a', 'b']})
> table = pa.Table.from_pandas(df)
> table_serialized = pa.serialize(table)
> table_serialized_components = table_serialized.to_components()
> with open('/tmp/p.pickle', 'wb') as f:
> pickle.dump(table_serialized_components, f)
> print('/tmp/p.pickle written ok')
> if sys.argv[1] == 'r':
> # UNCOMMENT THE FOLLOWING LINE TO AVOID THE CRASH
> #pa.serialize(0)
> with open('/tmp/p.pickle', 'rb') as f:
> table_serialized_components = pickle.load(f)
> table = pa.deserialize_components(table_serialized_components)
> print(table)
> {code}
> Then run:
> {code:java}
> $ python pa_serialization_crashes.py w
> /tmp/p.pickle written ok
> $ python pa_serialization_crashes.py r
> Segmentation fault (core dumped){code}
> The crash would not occur if you try to serialize unrelated data before the
> deserialization (see a commented out line in the reproduction instructions)
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)