[ 
https://issues.apache.org/jira/browse/ARROW-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1695:
----------------------------------
    Labels: pull-request-available  (was: )

> [Serialization] Fix reference counting of numpy arrays created in custom 
> serialializer
> --------------------------------------------------------------------------------------
>
>                 Key: ARROW-1695
>                 URL: https://issues.apache.org/jira/browse/ARROW-1695
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.7.1
>            Reporter: Philipp Moritz
>              Labels: pull-request-available
>             Fix For: 0.8.0
>
>
> The problem happens with the following code:
> {code}
> import numpy as np
> import pyarrow
> import sys
> class Bar(object):
>     pass
> def bar_custom_serializer(obj):
>     x = np.zeros(4)
>     return x
> def bar_custom_deserializer(serialized_obj):
>     return serialized_obj
> pyarrow._default_serialization_context.register_type(Bar, "Bar", 
> pickle=False, custom_serializer=bar_custom_serializer, 
> custom_deserializer=bar_custom_deserializer)
> pyarrow.serialize(Bar())
> {code}
> After execution of pyarrow.serialize, the interpreter crashes in the garbage 
> collection routine.
> This happens if a numpy array is returned in the custom serializer but there 
> is no other reference to the numpy array. The reason this is not a problem in 
> the current code is that so far we haven't created new numpy arrays in the 
> custom serializer.
> I think the problem here is that the numpy array hits reference count zero 
> between the end of SerializeSequences in python_to_arrow.cc and the call to 
> NdarrayToTensor. I'll push a fix later today, which just increases and 
> decreases the reference counts at the appropriate places.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to