[jira] [Commented] (ARROW-1854) [Python] Improve performance of serializing object dtype ndarrays

Robert Nishihara (JIRA) Sat, 25 Nov 2017 21:37:34 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16265914#comment-16265914
 ]


Robert Nishihara commented on ARROW-1854:
-----------------------------------------

That would certainly work. It wouldn't give us any of the benefits of using 
Arrow, but for numpy arrays of general Python objects, we probably shouldn't 
expect that anyway.

It may be as simple as changing the custom serializer/deserializer. I'll take a 
quick look at that.

> [Python] Improve performance of serializing object dtype ndarrays
> -----------------------------------------------------------------
>
>                 Key: ARROW-1854
>                 URL: https://issues.apache.org/jira/browse/ARROW-1854
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Wes McKinney
>             Fix For: 0.8.0
>
>
> I haven't looked carefully at the hot path for this, but I would expect these 
> statements to have roughly the same performance (offloading the ndarray 
> serialization to pickle)
> {code}
> In [1]: import pickle
> In [2]: import numpy as np
> In [3]: import pyarrow as pa
> a
> In [4]: arr = np.array(['foo', 'bar', None] * 100000, dtype=object)
> In [5]: timeit serialized = pa.serialize(arr).to_buffer()
> 10 loops, best of 3: 27.1 ms per loop
> In [6]: timeit pickled = pickle.dumps(arr)
> 100 loops, best of 3: 6.03 ms per loop
> {code}
> [~robertnishihara] [~pcmoritz] I encountered this while working on 
> ARROW-1783, but it can likely be resolved independently



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ARROW-1854) [Python] Improve performance of serializing object dtype ndarrays

Reply via email to