[
https://issues.apache.org/jira/browse/ARROW-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433399#comment-17433399
]
Micah Kornfield commented on ARROW-12976:
-----------------------------------------
Yeah, given #1 and #2, I think I'll try to simply replicate existing behavior
in C++, even though it can lead to unexpected behavior.
> [Python] Arrow-to-Python conversion is slow
> -------------------------------------------
>
> Key: ARROW-12976
> URL: https://issues.apache.org/jira/browse/ARROW-12976
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Antoine Pitrou
> Assignee: Micah Kornfield
> Priority: Major
>
> It seems that we are 20x slower than Numpy for converting the exact same data
> to a Python list.
> With integers:
> {code:python}
> >>> arr = np.arange(0,1000, dtype=np.int64)
> >>> %timeit arr.tolist()
> 8.24 µs ± 3.46 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
> >>> parr = pa.array(arr)
> >>> %timeit parr.to_pylist()
> 218 µs ± 2.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
> {code}
> With floats:
> {code:python}
> >>> arr = np.arange(0,1000, dtype=np.float64)
> >>> %timeit arr.tolist()
> 10.2 µs ± 25.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
> >>> parr = pa.array(arr)
> >>> %timeit parr.to_pylist()
> 199 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)