pitrou commented on issue #11239:
URL: https://github.com/apache/arrow/issues/11239#issuecomment-941116357


   For the record, I get the following numbers here:
   
   **pickle5 with copies**
   
   ```pycon
   >>> %timeit persons_pickled = pickle5.dumps(PERSONS, protocol=5)
   39.3 ms ± 389 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
   >>> %timeit persons_depickled = pickle5.loads(persons_pickled)
   28.9 ms ± 71.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
   ```
   
   **pickle5 with out-of-band buffers**
   ```pycon
   >>> %timeit buffers=[]; persons_pickled = pickle5.dumps(PERSONS, protocol=5, 
buffer_callback=buffers.append)
   231 µs ± 1.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
   >>> %timeit persons_depickled = pickle5.loads(persons_pickled, 
buffers=buffers)
   121 µs ± 336 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
   ```
   
   **PyArrow serialization**
   ```pycon
   >>> %timeit persons_serialized = pa.serialize(PERSONS, 
context=context).to_buffer()
   18.6 ms ± 79.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
   >>> %timeit persons_deserialized = pa.deserialize(persons_serialized, 
context=context)
   398 µs ± 282 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
   ```
   
   **Summary table**
   
   |  | Serialization | Deserialization |
   | -- | -- | -- |
   | pickle5 with copies | 39.3 ms | 28.9 ms |
   | pickle5 with out-of-band-buffers | 231 µs | 121 µs |
   | PyArrow serialization | 18.6 ms | 398 µs |
   
   **Short analysis**
   
   By default, with `pickle` you pay the price of memory copies both for 
serialization and deserialization. PyArrow allows to avoid the price of memory 
copies for deserialization, but only on the read path. `pickle` out-of-band 
buffers avoid memory copies on _both_ sides.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to