[ 
https://issues.apache.org/jira/browse/ARROW-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315485#comment-16315485
 ] 

ASF GitHub Bot commented on ARROW-1972:
---------------------------------------

pcmoritz closed pull request #1463: ARROW-1972: [Python] Import pyarrow in 
DeserializeObject.
URL: https://github.com/apache/arrow/pull/1463
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/python/arrow_to_python.cc 
b/cpp/src/arrow/python/arrow_to_python.cc
index ce539a597..c060ab8bf 100644
--- a/cpp/src/arrow/python/arrow_to_python.cc
+++ b/cpp/src/arrow/python/arrow_to_python.cc
@@ -284,6 +284,7 @@ Status DeserializeObject(PyObject* context, const 
SerializedPyObject& obj, PyObj
                          PyObject** out) {
   PyAcquireGIL lock;
   PyDateTime_IMPORT;
+  import_pyarrow();
   return DeserializeList(context, *obj.batch->column(0), 0, 
obj.batch->num_rows(), base,
                          obj, out);
 }
diff --git a/python/pyarrow/tests/deserialize_buffer.py 
b/python/pyarrow/tests/deserialize_buffer.py
new file mode 100644
index 000000000..982dc6695
--- /dev/null
+++ b/python/pyarrow/tests/deserialize_buffer.py
@@ -0,0 +1,26 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# This file is called from a test in test_serialization.py.
+
+import sys
+
+import pyarrow as pa
+
+with open(sys.argv[1], 'rb') as f:
+    data = f.read()
+    pa.deserialize(data)
diff --git a/python/pyarrow/tests/test_serialization.py 
b/python/pyarrow/tests/test_serialization.py
index f245dc299..611655638 100644
--- a/python/pyarrow/tests/test_serialization.py
+++ b/python/pyarrow/tests/test_serialization.py
@@ -541,3 +541,17 @@ def deserialize_regex(serialized, q):
     p.start()
     assert q.get().pattern == regex.pattern
     p.join()
+
+
+def test_deserialize_buffer_in_different_process():
+    import tempfile
+    import subprocess
+
+    f = tempfile.NamedTemporaryFile(delete=False)
+    b = pa.serialize(pa.frombuffer(b'hello')).to_buffer()
+    f.write(b.to_pybytes())
+    f.close()
+
+    dir_path = os.path.dirname(os.path.realpath(__file__))
+    python_file = os.path.join(dir_path, 'deserialize_buffer.py')
+    subprocess.check_call(['python', python_file, f.name])


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Deserialization of buffer objects (and pandas dataframes) segfaults on 
> different processes.
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-1972
>                 URL: https://issues.apache.org/jira/browse/ARROW-1972
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Robert Nishihara
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> To see the issue, first serialize a pyarrow buffer.
> {code}
> import pyarrow as pa
> serialized = pa.serialize(pa.frombuffer(b'hello')).to_buffer().to_pybytes()
> print(serialized)  # b'\x00\x00\x00\x00\x01...'
> {code}
> Deserializing it within the same process succeeds, however deserializing it 
> in a **separate process** causes a segfault. E.g.,
> {code}
> import pyarrow as pa
> pa.deserialize(b'\x00\x00\x00\x00\x01...')  # This segfaults
> {code}
> The backtrace is
> {code}
> (lldb) bt
> * thread #1, queue = ‘com.apple.main-thread’, stop reason = EXC_BAD_ACCESS 
> (code=1, address=0x0)
>   * frame #0: 0x0000000000000000
>     frame #1: 0x0000000105605534 
> libarrow_python.0.dylib`arrow::py::wrap_buffer(buffer=std::__1::shared_ptr<arrow::Buffer>::element_type
>  @ 0x000000010060c348 strong=1 weak=1) at pyarrow.cc:48
>     frame #2: 0x000000010554fdee 
> libarrow_python.0.dylib`arrow::py::GetValue(context=0x0000000108f17818, 
> parent=0x0000000100645438, arr=0x0000000100622938, index=0, type=0, 
> base=0x0000000108f0e528, blobs=0x0000000108f09588, result=0x00007fff5fbfd218) 
> at arrow_to_python.cc:173
>     frame #3: 0x000000010554d93a 
> libarrow_python.0.dylib`arrow::py::DeserializeList(context=0x0000000108f17818,
>  array=0x0000000100645438, start_idx=0, stop_idx=2, base=0x0000000108f0e528, 
> blobs=0x0000000108f09588, out=0x00007fff5fbfd470) at arrow_to_python.cc:208
>     frame #4: 0x000000010554d302 
> libarrow_python.0.dylib`arrow::py::DeserializeDict(context=0x0000000108f17818,
>  array=0x0000000100645338, start_idx=0, stop_idx=2, base=0x0000000108f0e528, 
> blobs=0x0000000108f09588, out=0x00007fff5fbfddd8) at arrow_to_python.cc:74
>     frame #5: 0x000000010554f249 
> libarrow_python.0.dylib`arrow::py::GetValue(context=0x0000000108f17818, 
> parent=0x00000001006377a8, arr=0x0000000100645298, index=0, type=0, 
> base=0x0000000108f0e528, blobs=0x0000000108f09588, result=0x00007fff5fbfddd8) 
> at arrow_to_python.cc:158
>     frame #6: 0x000000010554d93a 
> libarrow_python.0.dylib`arrow::py::DeserializeList(context=0x0000000108f17818,
>  array=0x00000001006377a8, start_idx=0, stop_idx=1, base=0x0000000108f0e528, 
> blobs=0x0000000108f09588, out=0x00007fff5fbfdfe8) at arrow_to_python.cc:208
>     frame #7: 0x0000000105551fbf 
> libarrow_python.0.dylib`arrow::py::DeserializeObject(context=0x0000000108f17818,
>  obj=0x0000000108f09588, base=0x0000000108f0e528, out=0x00007fff5fbfdfe8) at 
> arrow_to_python.cc:287
>     frame #8: 0x0000000104abecae 
> lib.cpython-36m-darwin.so`__pyx_pf_7pyarrow_3lib_18SerializedPyObject_2deserialize(__pyx_v_self=0x0000000108f09570,
>  __pyx_v_context=0x0000000108f17818) at lib.cxx:88592
>     frame #9: 0x0000000104abdec4 
> lib.cpython-36m-darwin.so`__pyx_pw_7pyarrow_3lib_18SerializedPyObject_3deserialize(__pyx_v_self=0x0000000108f09570,
>  __pyx_args=0x000000010231f358, __pyx_kwds=0x0000000000000000) at 
> lib.cxx:88514
>     frame #10: 0x000000010008b5f1 python`PyCFunction_Call + 145
>     frame #11: 0x0000000104941208 
> lib.cpython-36m-darwin.so`__Pyx_PyObject_Call(func=0x0000000108f302d0, 
> arg=0x000000010231f358, kw=0x0000000000000000) at lib.cxx:116108
>     frame #12: 0x0000000104b0e3fa 
> lib.cpython-36m-darwin.so`__Pyx__PyObject_CallOneArg(func=0x0000000108f302d0, 
> arg=0x0000000108f17818) at lib.cxx:116147
>     frame #13: 0x0000000104944bc6 
> lib.cpython-36m-darwin.so`__Pyx_PyObject_CallOneArg(func=0x0000000108f302d0, 
> arg=0x0000000108f17818) at lib.cxx:116166
>     frame #14: 0x0000000104b09873 
> lib.cpython-36m-darwin.so`__pyx_pf_7pyarrow_3lib_124deserialize_from(__pyx_self=0x0000000000000000,
>  __pyx_v_source=0x0000000108ddeee8, __pyx_v_base=0x0000000108f0e528, 
> __pyx_v_context=0x0000000108f17818) at lib.cxx:90327
>     frame #15: 0x0000000104b09310 
> lib.cpython-36m-darwin.so`__pyx_pw_7pyarrow_3lib_125deserialize_from(__pyx_self=0x0000000000000000,
>  __pyx_args=0x0000000108f10d38, __pyx_kwds=0x0000000000000000) at 
> lib.cxx:90260
>     frame #16: 0x000000010008b5f1 python`PyCFunction_Call + 145
>     frame #17: 0x0000000104941208 
> lib.cpython-36m-darwin.so`__Pyx_PyObject_Call(func=0x0000000108baf1b0, 
> arg=0x0000000108f10d38, kw=0x0000000000000000) at lib.cxx:116108
>     frame #18: 0x0000000104b0bf9d 
> lib.cpython-36m-darwin.so`__pyx_pf_7pyarrow_3lib_128deserialize(__pyx_self=0x0000000000000000,
>  __pyx_v_obj=0x0000000108f0e528, __pyx_v_context=0x0000000108f17818) at 
> lib.cxx:90770
>     frame #19: 0x0000000104b0b7ec 
> lib.cpython-36m-darwin.so`__pyx_pw_7pyarrow_3lib_129deserialize(__pyx_self=0x0000000000000000,
>  __pyx_args=0x0000000108def1c8, __pyx_kwds=0x0000000000000000) at 
> lib.cxx:90680
>     frame #20: 0x000000010008b5f1 python`PyCFunction_Call + 145
>     frame #21: 0x0000000108d5c468 
> plasma.cpython-36m-darwin.so`__Pyx_PyObject_Call(func=0x0000000108baf240, 
> arg=0x0000000108def1c8, kw=0x0000000000000000) at plasma.cxx:11200
>     frame #22: 0x0000000108d744a7 
> plasma.cpython-36m-darwin.so`__pyx_pf_7pyarrow_6plasma_12PlasmaClient_10get(__pyx_v_self=0x0000000108f0e210,
>  __pyx_v_object_ids=0x0000000108deb248, __pyx_v_timeout_ms=0, 
> __pyx_v_serialization_context=0x0000000108f17818) at plasma.cxx:6480
>     frame #23: 0x0000000108d6c250 
> plasma.cpython-36m-darwin.so`__pyx_pw_7pyarrow_6plasma_12PlasmaClient_11get(__pyx_v_self=0x0000000108f0e210,
>  __pyx_args=0x0000000102363630, __pyx_kwds=0x0000000000000000) at 
> plasma.cxx:6274
>     frame #24: 0x000000010008bc5b python`_PyCFunction_FastCallDict + 363
>     frame #25: 0x00000001001637f2 python`call_function + 146
>     frame #26: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
>     frame #27: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
>     frame #28: 0x0000000100163c4c python`fast_function + 348
>     frame #29: 0x000000010016383e python`call_function + 222
>     frame #30: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
>     frame #31: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
>     frame #32: 0x0000000100163c4c python`fast_function + 348
>     frame #33: 0x000000010016383e python`call_function + 222
>     frame #34: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
>     frame #35: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
>     frame #36: 0x0000000100163c4c python`fast_function + 348
>     frame #37: 0x000000010016383e python`call_function + 222
>     frame #38: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
>     frame #39: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
>     frame #40: 0x00000001001b01dc python`PyRun_InteractiveOneObject + 1132
>     frame #41: 0x00000001001ad15e python`PyRun_InteractiveLoopFlags + 334
>     frame #42: 0x00000001001acfeb python`PyRun_AnyFileExFlags + 139
>     frame #43: 0x00000001001d3378 python`Py_Main + 4632
>     frame #44: 0x00000001000016bd python`main + 509
>     frame #45: 0x00007fffb6073235 libdyld.dylib`start + 1
> {code}
> Note however that if we first serialize something, then it works. E.g., the 
> following succeeds.
> {code}
> import pyarrow as pa
> pa.serialize(1)
> pa.deserialize(b'\x00\x00\x00\x00\x01...')  # This succeeds!
> {code}
> I have a potential fix/workaround, which I will post momentarily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to