I'm not aware of anyone looking at it.

Will out-of-band pickling "just work" in Beam for types that implement the
correct interface in Python 3.8?

On Tue, May 25, 2021 at 2:43 PM Evan Galpin <[email protected]> wrote:

> +1
>
> FWIW I recently ran into the exact case you described (high serialization
> cost). The solution was to implement some not-so-intuitive alternative
> transforms in my case, but I would have very much appreciated faster
> serialization performance.
>
> Thanks,
> Evan
>
> On Tue, May 25, 2021 at 15:26 Stephan Hoyer <[email protected]> wrote:
>
>> Has anyone looked into out of band pickling for Beam's Python SDK, i.e.,
>> Pickle protocol version 5?
>> https://www.python.org/dev/peps/pep-0574/
>> https://docs.python.org/3/library/pickle.html#out-of-band-buffers
>>
>> For Beam pipelines passing around NumPy arrays (or collections of NumPy
>> arrays, like pandas or Xarray) I've noticed that serialization costs can be
>> significant. Beam seems to currently incur at least one one (maybe two)
>> unnecessary memory copies.
>>
>> Pickle protocol version 5 exists for solving exactly this problem. You
>> can serialize collections of arbitrary Python objects in a fully streaming
>> fashion using memory buffers. This is a Python 3.8 feature, but the
>> "pickle5" library provides a backport to Python 3.6 and 3.7. It has been
>> supported by NumPy since version 1.16, released in January 2019.
>>
>> Cheers,
>> Stephan
>>
>

Reply via email to