Wes McKinney created ARROW-1854:
-----------------------------------

             Summary: [Python] Improve performance of serializing object dtype 
ndarrays
                 Key: ARROW-1854
                 URL: https://issues.apache.org/jira/browse/ARROW-1854
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Wes McKinney
             Fix For: 0.8.0


I haven't looked carefully at the hot path for this, but I would expect these 
statements to have roughly the same performance (offloading the ndarray 
serialization to pickle)

{code}
In [1]: import pickle

In [2]: import numpy as np

In [3]: import pyarrow as pa
a
In [4]: arr = np.array(['foo', 'bar', None] * 100000, dtype=object)

In [5]: timeit serialized = pa.serialize(arr).to_buffer()
10 loops, best of 3: 27.1 ms per loop

In [6]: timeit pickled = pickle.dumps(arr)
100 loops, best of 3: 6.03 ms per loop
{code}

[~robertnishihara] [~pcmoritz] I encountered this while working on ARROW-1783, 
but it can likely be resolved independently



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to