[
https://issues.apache.org/jira/browse/ARROW-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-1695:
----------------------------------
Labels: pull-request-available (was: )
> [Serialization] Fix reference counting of numpy arrays created in custom
> serialializer
> --------------------------------------------------------------------------------------
>
> Key: ARROW-1695
> URL: https://issues.apache.org/jira/browse/ARROW-1695
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.7.1
> Reporter: Philipp Moritz
> Labels: pull-request-available
> Fix For: 0.8.0
>
>
> The problem happens with the following code:
> {code}
> import numpy as np
> import pyarrow
> import sys
> class Bar(object):
> pass
> def bar_custom_serializer(obj):
> x = np.zeros(4)
> return x
> def bar_custom_deserializer(serialized_obj):
> return serialized_obj
> pyarrow._default_serialization_context.register_type(Bar, "Bar",
> pickle=False, custom_serializer=bar_custom_serializer,
> custom_deserializer=bar_custom_deserializer)
> pyarrow.serialize(Bar())
> {code}
> After execution of pyarrow.serialize, the interpreter crashes in the garbage
> collection routine.
> This happens if a numpy array is returned in the custom serializer but there
> is no other reference to the numpy array. The reason this is not a problem in
> the current code is that so far we haven't created new numpy arrays in the
> custom serializer.
> I think the problem here is that the numpy array hits reference count zero
> between the end of SerializeSequences in python_to_arrow.cc and the call to
> NdarrayToTensor. I'll push a fix later today, which just increases and
> decreases the reference counts at the appropriate places.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)