[
https://issues.apache.org/jira/browse/ARROW-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213051#comment-16213051
]
ASF GitHub Bot commented on ARROW-1695:
---------------------------------------
GitHub user pcmoritz opened a pull request:
https://github.com/apache/arrow/pull/1220
ARROW-1695: [Serialization] Fix reference counting of numpy arrays created
in custom serialializer
This uses the NumPyBuffer built into Arrow's Tensor facility to protect the
numpys holding the Tensors to be serialized to fix
https://issues.apache.org/jira/browse/ARROW-1695
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/pcmoritz/arrow fix-serialize-tensors
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/arrow/pull/1220.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1220
----
commit 9d879060e4a98fc9258577d3523b2982e19de26a
Author: Philipp Moritz <[email protected]>
Date: 2017-10-20T18:45:24Z
fix handling of numpy arrays generated in the custom serializer methods
----
> [Serialization] Fix reference counting of numpy arrays created in custom
> serialializer
> --------------------------------------------------------------------------------------
>
> Key: ARROW-1695
> URL: https://issues.apache.org/jira/browse/ARROW-1695
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.7.1
> Reporter: Philipp Moritz
> Labels: pull-request-available
> Fix For: 0.8.0
>
>
> The problem happens with the following code:
> {code}
> import numpy as np
> import pyarrow
> import sys
> class Bar(object):
> pass
> def bar_custom_serializer(obj):
> x = np.zeros(4)
> return x
> def bar_custom_deserializer(serialized_obj):
> return serialized_obj
> pyarrow._default_serialization_context.register_type(Bar, "Bar",
> pickle=False, custom_serializer=bar_custom_serializer,
> custom_deserializer=bar_custom_deserializer)
> pyarrow.serialize(Bar())
> {code}
> After execution of pyarrow.serialize, the interpreter crashes in the garbage
> collection routine.
> This happens if a numpy array is returned in the custom serializer but there
> is no other reference to the numpy array. The reason this is not a problem in
> the current code is that so far we haven't created new numpy arrays in the
> custom serializer.
> I think the problem here is that the numpy array hits reference count zero
> between the end of SerializeSequences in python_to_arrow.cc and the call to
> NdarrayToTensor. I'll push a fix later today, which just increases and
> decreases the reference counts at the appropriate places.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)