Neltherion opened a new issue #11239:
URL: https://github.com/apache/arrow/issues/11239


   It seems that PyArrow has deprecated the `pa.serialize()` and 
`pa.deserialize()` methods and suggests to use other options such a Pickle5. 
   
   Using Pickle5 doesn't seem to have the same performance as PyArrow's 
deprecated Serialization method. Is there ANY proper replacements for 
pa.serialize() and pa.deserialize()?
   
   Here's a simplified code that compares the difference between PyArrow & 
Pickle when Serializing/Deserializing:
   
   ```
   import time
   
   import numpy as np
   import pickle5
   import pyarrow as pa
   
   
   class Person:
       def __init__(self, Thumbnail: np.ndarray = None):
           if Thumbnail is not None:
               self.Thumbnail: np.ndarray = Thumbnail
           else:
               self.Thumbnail: np.ndarray = np.random.rand(256, 256, 3)
   
   
   def serialize_Person(person):
       return {'Thumbnail': person.Thumbnail}
   
   
   def deserialize_Person(person):
       return Person(person['Thumbnail'])
   
   
   context = pa.SerializationContext()
   context.register_type(Person, 'Person', custom_serializer=serialize_Person, 
custom_deserializer=deserialize_Person)
   
   PERSONS = [Person() for i in range(100)]
   
   """
   PyArrow
   """
   t1 = time.time()
   persons_serialized = pa.serialize(PERSONS, context=context).to_buffer()
   persons_deserialized = pa.deserialize(persons_serialized, context=context)
   t2 = time.time()
   print(f'PyArrow Time => {t2 - t1}')
   
   """
   Pickle
   """
   t1 = time.time()
   persons_pickled = pickle5.dumps(PERSONS, protocol=5)
   persons_depickled = pickle5.loads(persons_pickled)
   t2 = time.time()
   print(f'Pickle Time => {t2 - t1}')
   ```
   
   The outputs on my system are:
   
   ```
   PyArrow Time => 0.04499983787536621
   Pickle Time => 0.2220008373260498
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to