[ https://issues.apache.org/jira/browse/ARROW-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17661697#comment-17661697 ]
Rok Mihevc commented on ARROW-4675: ----------------------------------- This issue has been migrated to [issue #21205|https://github.com/apache/arrow/issues/21205] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] Error serializing bool ndarray in py2 and deserializing in py3 > ----------------------------------------------------------------------- > > Key: ARROW-4675 > URL: https://issues.apache.org/jira/browse/ARROW-4675 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.12.0 > Environment: * pyarrow 0.12.0 > * numpy 1.16.1 > * Python 3.7.0, 2.7.15 > * (macOS 10.13.6) > Reporter: Gabe Joseph > Assignee: Wes McKinney > Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > {{np.bool}} is the only dtype I've found that causes this issue. Both empty > and non-empty arrays cause it. > The issue only manifests from py2 to py3; staying within the same version > succeeds, as does serializing from py3 and deserializing in py2. > This appears to just be due to Python 2 {{str}} being deserialized in Python > 3 as {{bytes}}; it should be {{unicode}} on the py2 end to come back as > {{str}} in py3. I suppose something in the serialization implementation is > writing the dtype (just for bool arrays?) using a {{str}}, but haven't dug > into it yet. > {code:bash} > (two)bash-3.2$ python cereal.py > (two)bash-3.2$ cat cereal.py > # Python 2 > import numpy as np > import pyarrow as pa > data = np.array([], dtype=np.dtype('bool')) > buf = pa.serialize(data).to_buffer() > outstream = pa.output_stream("buffer") > outstream.write(buf) > outstream.close() > # ...switch to python 3 venv... > (three)bash-3.2$ cat decereal.py > # Python 3 > import numpy as np > import pyarrow as pa > instream = pa.input_stream("buffer") > buf = instream.read() > data = pa.deserialize(buf) > print(data) > (three)bash-3.2$ python3 decereal.py > Traceback (most recent call last): > File "decereal.py", line 10, in <module> > data = pa.deserialize(buf) > File "pyarrow/serialization.pxi", line 448, in pyarrow.lib.deserialize > File "pyarrow/serialization.pxi", line 411, in pyarrow.lib.deserialize_from > File "pyarrow/serialization.pxi", line 262, in > pyarrow.lib.SerializedPyObject.deserialize > File "pyarrow/serialization.pxi", line 175, in > pyarrow.lib.SerializationContext._deserialize_callback > TypeError: can only concatenate str (not "bytes") to str > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)