[ 
https://issues.apache.org/jira/browse/ARROW-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315426#comment-16315426
 ] 

ASF GitHub Bot commented on ARROW-1972:
---------------------------------------

xhochy commented on a change in pull request #1463: ARROW-1972: [Python] Import 
pyarrow in DeserializeObject.
URL: https://github.com/apache/arrow/pull/1463#discussion_r160057510
 
 

 ##########
 File path: cpp/src/arrow/python/arrow_to_python.cc
 ##########
 @@ -284,6 +284,7 @@ Status DeserializeObject(PyObject* context, const 
SerializedPyObject& obj, PyObj
                          PyObject** out) {
   PyAcquireGIL lock;
   PyDateTime_IMPORT;
+  import_pyarrow();
   return DeserializeList(context, *obj.batch->column(0), 0, 
obj.batch->num_rows(), base,
 
 Review comment:
   I‘m confused that we need an `import_pyarrow` here. Can someone explain this 
better to me so I am more cautious about this problem? I would have assumed 
that we have imported all Arrow symbols in the case that any code in this `.cc` 
is called.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Deserialization of buffer objects (and pandas dataframes) segfaults on 
> different processes.
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-1972
>                 URL: https://issues.apache.org/jira/browse/ARROW-1972
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Robert Nishihara
>              Labels: pull-request-available
>
> To see the issue, first serialize a pyarrow buffer.
> {code}
> import pyarrow as pa
> serialized = pa.serialize(pa.frombuffer(b'hello')).to_buffer().to_pybytes()
> print(serialized)  # b'\x00\x00\x00\x00\x01...'
> {code}
> Deserializing it within the same process succeeds, however deserializing it 
> in a **separate process** causes a segfault. E.g.,
> {code}
> import pyarrow as pa
> pa.deserialize(b'\x00\x00\x00\x00\x01...')  # This segfaults
> {code}
> The backtrace is
> {code}
> (lldb) bt
> * thread #1, queue = ‘com.apple.main-thread’, stop reason = EXC_BAD_ACCESS 
> (code=1, address=0x0)
>   * frame #0: 0x0000000000000000
>     frame #1: 0x0000000105605534 
> libarrow_python.0.dylib`arrow::py::wrap_buffer(buffer=std::__1::shared_ptr<arrow::Buffer>::element_type
>  @ 0x000000010060c348 strong=1 weak=1) at pyarrow.cc:48
>     frame #2: 0x000000010554fdee 
> libarrow_python.0.dylib`arrow::py::GetValue(context=0x0000000108f17818, 
> parent=0x0000000100645438, arr=0x0000000100622938, index=0, type=0, 
> base=0x0000000108f0e528, blobs=0x0000000108f09588, result=0x00007fff5fbfd218) 
> at arrow_to_python.cc:173
>     frame #3: 0x000000010554d93a 
> libarrow_python.0.dylib`arrow::py::DeserializeList(context=0x0000000108f17818,
>  array=0x0000000100645438, start_idx=0, stop_idx=2, base=0x0000000108f0e528, 
> blobs=0x0000000108f09588, out=0x00007fff5fbfd470) at arrow_to_python.cc:208
>     frame #4: 0x000000010554d302 
> libarrow_python.0.dylib`arrow::py::DeserializeDict(context=0x0000000108f17818,
>  array=0x0000000100645338, start_idx=0, stop_idx=2, base=0x0000000108f0e528, 
> blobs=0x0000000108f09588, out=0x00007fff5fbfddd8) at arrow_to_python.cc:74
>     frame #5: 0x000000010554f249 
> libarrow_python.0.dylib`arrow::py::GetValue(context=0x0000000108f17818, 
> parent=0x00000001006377a8, arr=0x0000000100645298, index=0, type=0, 
> base=0x0000000108f0e528, blobs=0x0000000108f09588, result=0x00007fff5fbfddd8) 
> at arrow_to_python.cc:158
>     frame #6: 0x000000010554d93a 
> libarrow_python.0.dylib`arrow::py::DeserializeList(context=0x0000000108f17818,
>  array=0x00000001006377a8, start_idx=0, stop_idx=1, base=0x0000000108f0e528, 
> blobs=0x0000000108f09588, out=0x00007fff5fbfdfe8) at arrow_to_python.cc:208
>     frame #7: 0x0000000105551fbf 
> libarrow_python.0.dylib`arrow::py::DeserializeObject(context=0x0000000108f17818,
>  obj=0x0000000108f09588, base=0x0000000108f0e528, out=0x00007fff5fbfdfe8) at 
> arrow_to_python.cc:287
>     frame #8: 0x0000000104abecae 
> lib.cpython-36m-darwin.so`__pyx_pf_7pyarrow_3lib_18SerializedPyObject_2deserialize(__pyx_v_self=0x0000000108f09570,
>  __pyx_v_context=0x0000000108f17818) at lib.cxx:88592
>     frame #9: 0x0000000104abdec4 
> lib.cpython-36m-darwin.so`__pyx_pw_7pyarrow_3lib_18SerializedPyObject_3deserialize(__pyx_v_self=0x0000000108f09570,
>  __pyx_args=0x000000010231f358, __pyx_kwds=0x0000000000000000) at 
> lib.cxx:88514
>     frame #10: 0x000000010008b5f1 python`PyCFunction_Call + 145
>     frame #11: 0x0000000104941208 
> lib.cpython-36m-darwin.so`__Pyx_PyObject_Call(func=0x0000000108f302d0, 
> arg=0x000000010231f358, kw=0x0000000000000000) at lib.cxx:116108
>     frame #12: 0x0000000104b0e3fa 
> lib.cpython-36m-darwin.so`__Pyx__PyObject_CallOneArg(func=0x0000000108f302d0, 
> arg=0x0000000108f17818) at lib.cxx:116147
>     frame #13: 0x0000000104944bc6 
> lib.cpython-36m-darwin.so`__Pyx_PyObject_CallOneArg(func=0x0000000108f302d0, 
> arg=0x0000000108f17818) at lib.cxx:116166
>     frame #14: 0x0000000104b09873 
> lib.cpython-36m-darwin.so`__pyx_pf_7pyarrow_3lib_124deserialize_from(__pyx_self=0x0000000000000000,
>  __pyx_v_source=0x0000000108ddeee8, __pyx_v_base=0x0000000108f0e528, 
> __pyx_v_context=0x0000000108f17818) at lib.cxx:90327
>     frame #15: 0x0000000104b09310 
> lib.cpython-36m-darwin.so`__pyx_pw_7pyarrow_3lib_125deserialize_from(__pyx_self=0x0000000000000000,
>  __pyx_args=0x0000000108f10d38, __pyx_kwds=0x0000000000000000) at 
> lib.cxx:90260
>     frame #16: 0x000000010008b5f1 python`PyCFunction_Call + 145
>     frame #17: 0x0000000104941208 
> lib.cpython-36m-darwin.so`__Pyx_PyObject_Call(func=0x0000000108baf1b0, 
> arg=0x0000000108f10d38, kw=0x0000000000000000) at lib.cxx:116108
>     frame #18: 0x0000000104b0bf9d 
> lib.cpython-36m-darwin.so`__pyx_pf_7pyarrow_3lib_128deserialize(__pyx_self=0x0000000000000000,
>  __pyx_v_obj=0x0000000108f0e528, __pyx_v_context=0x0000000108f17818) at 
> lib.cxx:90770
>     frame #19: 0x0000000104b0b7ec 
> lib.cpython-36m-darwin.so`__pyx_pw_7pyarrow_3lib_129deserialize(__pyx_self=0x0000000000000000,
>  __pyx_args=0x0000000108def1c8, __pyx_kwds=0x0000000000000000) at 
> lib.cxx:90680
>     frame #20: 0x000000010008b5f1 python`PyCFunction_Call + 145
>     frame #21: 0x0000000108d5c468 
> plasma.cpython-36m-darwin.so`__Pyx_PyObject_Call(func=0x0000000108baf240, 
> arg=0x0000000108def1c8, kw=0x0000000000000000) at plasma.cxx:11200
>     frame #22: 0x0000000108d744a7 
> plasma.cpython-36m-darwin.so`__pyx_pf_7pyarrow_6plasma_12PlasmaClient_10get(__pyx_v_self=0x0000000108f0e210,
>  __pyx_v_object_ids=0x0000000108deb248, __pyx_v_timeout_ms=0, 
> __pyx_v_serialization_context=0x0000000108f17818) at plasma.cxx:6480
>     frame #23: 0x0000000108d6c250 
> plasma.cpython-36m-darwin.so`__pyx_pw_7pyarrow_6plasma_12PlasmaClient_11get(__pyx_v_self=0x0000000108f0e210,
>  __pyx_args=0x0000000102363630, __pyx_kwds=0x0000000000000000) at 
> plasma.cxx:6274
>     frame #24: 0x000000010008bc5b python`_PyCFunction_FastCallDict + 363
>     frame #25: 0x00000001001637f2 python`call_function + 146
>     frame #26: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
>     frame #27: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
>     frame #28: 0x0000000100163c4c python`fast_function + 348
>     frame #29: 0x000000010016383e python`call_function + 222
>     frame #30: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
>     frame #31: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
>     frame #32: 0x0000000100163c4c python`fast_function + 348
>     frame #33: 0x000000010016383e python`call_function + 222
>     frame #34: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
>     frame #35: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
>     frame #36: 0x0000000100163c4c python`fast_function + 348
>     frame #37: 0x000000010016383e python`call_function + 222
>     frame #38: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
>     frame #39: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
>     frame #40: 0x00000001001b01dc python`PyRun_InteractiveOneObject + 1132
>     frame #41: 0x00000001001ad15e python`PyRun_InteractiveLoopFlags + 334
>     frame #42: 0x00000001001acfeb python`PyRun_AnyFileExFlags + 139
>     frame #43: 0x00000001001d3378 python`Py_Main + 4632
>     frame #44: 0x00000001000016bd python`main + 509
>     frame #45: 0x00007fffb6073235 libdyld.dylib`start + 1
> {code}
> Note however that if we first serialize something, then it works. E.g., the 
> following succeeds.
> {code}
> import pyarrow as pa
> pa.serialize(1)
> pa.deserialize(b'\x00\x00\x00\x00\x01...')  # This succeeds!
> {code}
> I have a potential fix/workaround, which I will post momentarily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to