[
https://issues.apache.org/jira/browse/ARROW-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brian Bowman updated ARROW-1854:
--------------------------------
Attachment: text.html
I’m out of the office for vacation, followed by the SAS Winter Holiday until
Tuesay January 2nd 2018.
-Brian
On Nov 24, 2017, at 3:16 PM, Wes McKinney (JIRA) <[email protected]> wrote:
EXTERNAL
Wes McKinney created ARROW-1854:
-----------------------------------
Summary: [Python] Improve performance of serializing object dtype
ndarrays
Key: ARROW-1854
URL: https://issues.apache.org/jira/browse/ARROW-1854
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Wes McKinney
Fix For: 0.8.0
I haven't looked carefully at the hot path for this, but I would expect these
statements to have roughly the same performance (offloading the ndarray
serialization to pickle)
{code}
In [1]: import pickle
In [2]: import numpy as np
In [3]: import pyarrow as pa
a
In [4]: arr = np.array(['foo', 'bar', None] * 100000, dtype=object)
In [5]: timeit serialized = pa.serialize(arr).to_buffer()
10 loops, best of 3: 27.1 ms per loop
In [6]: timeit pickled = pickle.dumps(arr)
100 loops, best of 3: 6.03 ms per loop
{code}
[~robertnishihara] [~pcmoritz] I encountered this while working on ARROW-1783,
but it can likely be resolved independently
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
> [Python] Improve performance of serializing object dtype ndarrays
> -----------------------------------------------------------------
>
> Key: ARROW-1854
> URL: https://issues.apache.org/jira/browse/ARROW-1854
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Wes McKinney
> Assignee: Wes McKinney
> Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: text.html
>
>
> I haven't looked carefully at the hot path for this, but I would expect these
> statements to have roughly the same performance (offloading the ndarray
> serialization to pickle)
> {code}
> In [1]: import pickle
> In [2]: import numpy as np
> In [3]: import pyarrow as pa
> a
> In [4]: arr = np.array(['foo', 'bar', None] * 100000, dtype=object)
> In [5]: timeit serialized = pa.serialize(arr).to_buffer()
> 10 loops, best of 3: 27.1 ms per loop
> In [6]: timeit pickled = pickle.dumps(arr)
> 100 loops, best of 3: 6.03 ms per loop
> {code}
> [~robertnishihara] [~pcmoritz] I encountered this while working on
> ARROW-1783, but it can likely be resolved independently
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)