[ 
https://issues.apache.org/jira/browse/ARROW-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315475#comment-16315475
 ] 

ASF GitHub Bot commented on ARROW-1972:
---------------------------------------

pcmoritz commented on a change in pull request #1463: ARROW-1972: [Python] 
Import pyarrow in DeserializeObject.
URL: https://github.com/apache/arrow/pull/1463#discussion_r160060200
 
 

 ##########
 File path: cpp/src/arrow/python/arrow_to_python.cc
 ##########
 @@ -284,6 +284,7 @@ Status DeserializeObject(PyObject* context, const 
SerializedPyObject& obj, PyObj
                          PyObject** out) {
   PyAcquireGIL lock;
   PyDateTime_IMPORT;
+  import_pyarrow();
   return DeserializeList(context, *obj.batch->column(0), 0, 
obj.batch->num_rows(), base,
 
 Review comment:
   The reason why this is needed is that it imports the wrap_buffer and 
unwrap_buffer symbols from the cython extension which are needed in the buffer 
handling code of the Deserialize calls. In the serialialization codepath this 
already happens, so if we serialize something and then deserialize it, 
everything is fine. However, if we start a new python interpreter where the 
serialize codepath was never called and we try to deserialize an object that 
contains a buffer, it gives a segfault because import_pyarrow was never called.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Deserialization of buffer objects (and pandas dataframes) segfaults on 
> different processes.
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-1972
>                 URL: https://issues.apache.org/jira/browse/ARROW-1972
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Robert Nishihara
>              Labels: pull-request-available
>
> To see the issue, first serialize a pyarrow buffer.
> {code}
> import pyarrow as pa
> serialized = pa.serialize(pa.frombuffer(b'hello')).to_buffer().to_pybytes()
> print(serialized)  # b'\x00\x00\x00\x00\x01...'
> {code}
> Deserializing it within the same process succeeds, however deserializing it 
> in a **separate process** causes a segfault. E.g.,
> {code}
> import pyarrow as pa
> pa.deserialize(b'\x00\x00\x00\x00\x01...')  # This segfaults
> {code}
> The backtrace is
> {code}
> (lldb) bt
> * thread #1, queue = ‘com.apple.main-thread’, stop reason = EXC_BAD_ACCESS 
> (code=1, address=0x0)
>   * frame #0: 0x0000000000000000
>     frame #1: 0x0000000105605534 
> libarrow_python.0.dylib`arrow::py::wrap_buffer(buffer=std::__1::shared_ptr<arrow::Buffer>::element_type
>  @ 0x000000010060c348 strong=1 weak=1) at pyarrow.cc:48
>     frame #2: 0x000000010554fdee 
> libarrow_python.0.dylib`arrow::py::GetValue(context=0x0000000108f17818, 
> parent=0x0000000100645438, arr=0x0000000100622938, index=0, type=0, 
> base=0x0000000108f0e528, blobs=0x0000000108f09588, result=0x00007fff5fbfd218) 
> at arrow_to_python.cc:173
>     frame #3: 0x000000010554d93a 
> libarrow_python.0.dylib`arrow::py::DeserializeList(context=0x0000000108f17818,
>  array=0x0000000100645438, start_idx=0, stop_idx=2, base=0x0000000108f0e528, 
> blobs=0x0000000108f09588, out=0x00007fff5fbfd470) at arrow_to_python.cc:208
>     frame #4: 0x000000010554d302 
> libarrow_python.0.dylib`arrow::py::DeserializeDict(context=0x0000000108f17818,
>  array=0x0000000100645338, start_idx=0, stop_idx=2, base=0x0000000108f0e528, 
> blobs=0x0000000108f09588, out=0x00007fff5fbfddd8) at arrow_to_python.cc:74
>     frame #5: 0x000000010554f249 
> libarrow_python.0.dylib`arrow::py::GetValue(context=0x0000000108f17818, 
> parent=0x00000001006377a8, arr=0x0000000100645298, index=0, type=0, 
> base=0x0000000108f0e528, blobs=0x0000000108f09588, result=0x00007fff5fbfddd8) 
> at arrow_to_python.cc:158
>     frame #6: 0x000000010554d93a 
> libarrow_python.0.dylib`arrow::py::DeserializeList(context=0x0000000108f17818,
>  array=0x00000001006377a8, start_idx=0, stop_idx=1, base=0x0000000108f0e528, 
> blobs=0x0000000108f09588, out=0x00007fff5fbfdfe8) at arrow_to_python.cc:208
>     frame #7: 0x0000000105551fbf 
> libarrow_python.0.dylib`arrow::py::DeserializeObject(context=0x0000000108f17818,
>  obj=0x0000000108f09588, base=0x0000000108f0e528, out=0x00007fff5fbfdfe8) at 
> arrow_to_python.cc:287
>     frame #8: 0x0000000104abecae 
> lib.cpython-36m-darwin.so`__pyx_pf_7pyarrow_3lib_18SerializedPyObject_2deserialize(__pyx_v_self=0x0000000108f09570,
>  __pyx_v_context=0x0000000108f17818) at lib.cxx:88592
>     frame #9: 0x0000000104abdec4 
> lib.cpython-36m-darwin.so`__pyx_pw_7pyarrow_3lib_18SerializedPyObject_3deserialize(__pyx_v_self=0x0000000108f09570,
>  __pyx_args=0x000000010231f358, __pyx_kwds=0x0000000000000000) at 
> lib.cxx:88514
>     frame #10: 0x000000010008b5f1 python`PyCFunction_Call + 145
>     frame #11: 0x0000000104941208 
> lib.cpython-36m-darwin.so`__Pyx_PyObject_Call(func=0x0000000108f302d0, 
> arg=0x000000010231f358, kw=0x0000000000000000) at lib.cxx:116108
>     frame #12: 0x0000000104b0e3fa 
> lib.cpython-36m-darwin.so`__Pyx__PyObject_CallOneArg(func=0x0000000108f302d0, 
> arg=0x0000000108f17818) at lib.cxx:116147
>     frame #13: 0x0000000104944bc6 
> lib.cpython-36m-darwin.so`__Pyx_PyObject_CallOneArg(func=0x0000000108f302d0, 
> arg=0x0000000108f17818) at lib.cxx:116166
>     frame #14: 0x0000000104b09873 
> lib.cpython-36m-darwin.so`__pyx_pf_7pyarrow_3lib_124deserialize_from(__pyx_self=0x0000000000000000,
>  __pyx_v_source=0x0000000108ddeee8, __pyx_v_base=0x0000000108f0e528, 
> __pyx_v_context=0x0000000108f17818) at lib.cxx:90327
>     frame #15: 0x0000000104b09310 
> lib.cpython-36m-darwin.so`__pyx_pw_7pyarrow_3lib_125deserialize_from(__pyx_self=0x0000000000000000,
>  __pyx_args=0x0000000108f10d38, __pyx_kwds=0x0000000000000000) at 
> lib.cxx:90260
>     frame #16: 0x000000010008b5f1 python`PyCFunction_Call + 145
>     frame #17: 0x0000000104941208 
> lib.cpython-36m-darwin.so`__Pyx_PyObject_Call(func=0x0000000108baf1b0, 
> arg=0x0000000108f10d38, kw=0x0000000000000000) at lib.cxx:116108
>     frame #18: 0x0000000104b0bf9d 
> lib.cpython-36m-darwin.so`__pyx_pf_7pyarrow_3lib_128deserialize(__pyx_self=0x0000000000000000,
>  __pyx_v_obj=0x0000000108f0e528, __pyx_v_context=0x0000000108f17818) at 
> lib.cxx:90770
>     frame #19: 0x0000000104b0b7ec 
> lib.cpython-36m-darwin.so`__pyx_pw_7pyarrow_3lib_129deserialize(__pyx_self=0x0000000000000000,
>  __pyx_args=0x0000000108def1c8, __pyx_kwds=0x0000000000000000) at 
> lib.cxx:90680
>     frame #20: 0x000000010008b5f1 python`PyCFunction_Call + 145
>     frame #21: 0x0000000108d5c468 
> plasma.cpython-36m-darwin.so`__Pyx_PyObject_Call(func=0x0000000108baf240, 
> arg=0x0000000108def1c8, kw=0x0000000000000000) at plasma.cxx:11200
>     frame #22: 0x0000000108d744a7 
> plasma.cpython-36m-darwin.so`__pyx_pf_7pyarrow_6plasma_12PlasmaClient_10get(__pyx_v_self=0x0000000108f0e210,
>  __pyx_v_object_ids=0x0000000108deb248, __pyx_v_timeout_ms=0, 
> __pyx_v_serialization_context=0x0000000108f17818) at plasma.cxx:6480
>     frame #23: 0x0000000108d6c250 
> plasma.cpython-36m-darwin.so`__pyx_pw_7pyarrow_6plasma_12PlasmaClient_11get(__pyx_v_self=0x0000000108f0e210,
>  __pyx_args=0x0000000102363630, __pyx_kwds=0x0000000000000000) at 
> plasma.cxx:6274
>     frame #24: 0x000000010008bc5b python`_PyCFunction_FastCallDict + 363
>     frame #25: 0x00000001001637f2 python`call_function + 146
>     frame #26: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
>     frame #27: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
>     frame #28: 0x0000000100163c4c python`fast_function + 348
>     frame #29: 0x000000010016383e python`call_function + 222
>     frame #30: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
>     frame #31: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
>     frame #32: 0x0000000100163c4c python`fast_function + 348
>     frame #33: 0x000000010016383e python`call_function + 222
>     frame #34: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
>     frame #35: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
>     frame #36: 0x0000000100163c4c python`fast_function + 348
>     frame #37: 0x000000010016383e python`call_function + 222
>     frame #38: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
>     frame #39: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
>     frame #40: 0x00000001001b01dc python`PyRun_InteractiveOneObject + 1132
>     frame #41: 0x00000001001ad15e python`PyRun_InteractiveLoopFlags + 334
>     frame #42: 0x00000001001acfeb python`PyRun_AnyFileExFlags + 139
>     frame #43: 0x00000001001d3378 python`Py_Main + 4632
>     frame #44: 0x00000001000016bd python`main + 509
>     frame #45: 0x00007fffb6073235 libdyld.dylib`start + 1
> {code}
> Note however that if we first serialize something, then it works. E.g., the 
> following succeeds.
> {code}
> import pyarrow as pa
> pa.serialize(1)
> pa.deserialize(b'\x00\x00\x00\x00\x01...')  # This succeeds!
> {code}
> I have a potential fix/workaround, which I will post momentarily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to