[ 
https://issues.apache.org/jira/browse/ARROW-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227248#comment-16227248
 ] 

ASF GitHub Bot commented on ARROW-1753:
---------------------------------------

wesm closed pull request #1272: ARROW-1753: [Python] Provide for matching 
subclasses with register_type in serialization context
URL: https://github.com/apache/arrow/pull/1272
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/ipc/metadata-internal.cc 
b/cpp/src/arrow/ipc/metadata-internal.cc
index f04e9b05a..f0f0f6758 100644
--- a/cpp/src/arrow/ipc/metadata-internal.cc
+++ b/cpp/src/arrow/ipc/metadata-internal.cc
@@ -72,7 +72,7 @@ MetadataVersion GetMetadataVersion(flatbuf::MetadataVersion 
version) {
     case flatbuf::MetadataVersion_V4:
       // Arrow >= 0.8
       return MetadataVersion::V4;
-      // Add cases as other versions become available
+    // Add cases as other versions become available
     default:
       return MetadataVersion::V4;
   }
diff --git a/python/pyarrow/serialization.pxi b/python/pyarrow/serialization.pxi
index 4e9ab8eb3..6b7227797 100644
--- a/python/pyarrow/serialization.pxi
+++ b/python/pyarrow/serialization.pxi
@@ -88,17 +88,26 @@ cdef class SerializationContext:
             self.custom_deserializers[type_id] = custom_deserializer
 
     def _serialize_callback(self, obj):
-        if type(obj) not in self.type_to_type_id:
+        found = False
+        for type_ in type(obj).__mro__:
+            if type_ in self.type_to_type_id:
+                found = True
+                break
+
+        if not found:
             raise SerializationCallbackError(
                 "pyarrow does not know how to "
-                "serialize objects of type {}.".format(type(obj)), obj)
-        type_id = self.type_to_type_id[type(obj)]
+                "serialize objects of type {}.".format(type(obj)), obj
+            )
+
+        # use the closest match to type(obj)
+        type_id = self.type_to_type_id[type_]
         if type_id in self.types_to_pickle:
             serialized_obj = {"data": pickle.dumps(obj), "pickle": True}
         elif type_id in self.custom_serializers:
             serialized_obj = {"data": self.custom_serializers[type_id](obj)}
         else:
-            if is_named_tuple(type(obj)):
+            if is_named_tuple(type_):
                 serialized_obj = {}
                 serialized_obj["_pa_getnewargs_"] = obj.__getnewargs__()
             elif hasattr(obj, "__dict__"):
diff --git a/python/pyarrow/serialization.py b/python/pyarrow/serialization.py
index 9dc8ee6de..2b47513fd 100644
--- a/python/pyarrow/serialization.py
+++ b/python/pyarrow/serialization.py
@@ -69,6 +69,8 @@ def _deserialize_default_dict(data):
         type(lambda: 0), "function",
         pickle=True)
 
+    serialization_context.register_type(type, "type", pickle=True)
+
     # ----------------------------------------------------------------------
     # Set up serialization for numpy with dtype object (primitive types are
     # handled efficiently with Arrow's Tensor facilities, see
diff --git a/python/pyarrow/tests/test_serialization.py 
b/python/pyarrow/tests/test_serialization.py
index 7878a0922..b0c5bc49e 100644
--- a/python/pyarrow/tests/test_serialization.py
+++ b/python/pyarrow/tests/test_serialization.py
@@ -416,3 +416,69 @@ class TempClass(object):
     with pytest.raises(pa.DeserializationCallbackError) as err:
         serialized_object.deserialize(deserialization_context)
     assert err.value.type_id == 20*b"\x00"
+
+
+def test_fallback_to_subclasses():
+
+    class SubFoo(Foo):
+        def __init__(self):
+            Foo.__init__(self)
+
+    # should be able to serialize/deserialize an instance
+    # if a base class has been registered
+    serialization_context = pa.SerializationContext()
+    serialization_context.register_type(Foo, "Foo")
+
+    subfoo = SubFoo()
+    # should fallbact to Foo serializer
+    serialized_object = pa.serialize(subfoo, serialization_context)
+
+    reconstructed_object = serialized_object.deserialize(
+        serialization_context
+    )
+    assert type(reconstructed_object) == Foo
+
+
+class Serializable(object):
+    pass
+
+
+def serialize_serializable(obj):
+    return {"type": type(obj), "data": obj.__dict__}
+
+
+def deserialize_serializable(obj):
+    val = obj["type"].__new__(obj["type"])
+    val.__dict__.update(obj["data"])
+    return val
+
+
+class SerializableClass(Serializable):
+    def __init__(self):
+        self.value = 3
+
+
+def test_serialize_subclasses():
+
+    # This test shows how subclasses can be handled in an idiomatic way
+    # by having only a serializer for the base class
+
+    # This technique should however be used with care, since pickling
+    # type(obj) with couldpickle will include the full class definition
+    # in the serialized representation.
+    # This means the class definition is part of every instance of the
+    # object, which in general is not desirable; registering all subclasses
+    # with register_type will result in faster and more memory
+    # efficient serialization.
+
+    serialization_context.register_type(
+        Serializable, "Serializable",
+        custom_serializer=serialize_serializable,
+        custom_deserializer=deserialize_serializable)
+
+    a = SerializableClass()
+    serialized = pa.serialize(a)
+
+    deserialized = serialized.deserialize()
+    assert type(deserialized).__name__ == SerializableClass.__name__
+    assert deserialized.value == 3
diff --git a/python/requirements.txt b/python/requirements.txt
index d2e28a774..8d0c33afa 100644
--- a/python/requirements.txt
+++ b/python/requirements.txt
@@ -1,4 +1,4 @@
 pytest
-cloudpickle
+cloudpickle>=0.4.0
 numpy>=1.10.0
 six


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> [Python] Provide for matching subclasses with register_type in serialization 
> context
> ------------------------------------------------------------------------------------
>
>                 Key: ARROW-1753
>                 URL: https://issues.apache.org/jira/browse/ARROW-1753
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Wes McKinney
>              Labels: pull-request-available
>             Fix For: 0.8.0
>
>
> Copied from https://github.com/apache/arrow/issues/1269
> To allow factoring out serialization code to a common base class it would be 
> useful if register_type matches all subclasses rather than simply an exact 
> match on the specific type.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to