[
https://issues.apache.org/jira/browse/ARROW-13914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17410683#comment-17410683
]
Antoine Pitrou commented on ARROW-13914:
----------------------------------------
We could, but in the end I'm not sure it would be very useful. Presumably, if
you really care about performance you avoid generating your data in pure Python.
Converting from a Numpy array of integers is 100x faster than converting from a
Python list of integers:
{code:python}
>>> d = list(range(10000))
>>> nd = np.array(d)
>>> %timeit pa.array(d)
288 µs ± 454 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit pa.array(d, type=pa.int64())
234 µs ± 3.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit pa.array(nd)
1.96 µs ± 6.64 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> %timeit pa.array(nd, type=pa.int64())
1.81 µs ± 6.24 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
{code}
> [C++][Python] Optimize type inference when converting from python values
> ------------------------------------------------------------------------
>
> Key: ARROW-13914
> URL: https://issues.apache.org/jira/browse/ARROW-13914
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Krisztian Szucs
> Priority: Minor
>
> Currently we use an extensive set of checks to infer arrow type from python
> sequences.
> Last time I checked using asv, the inference part had a significant overhead.
> We could try other approaches to speed-up the type inference, see comments:
> https://github.com/apache/arrow/pull/11076#discussion_r702808196
--
This message was sent by Atlassian Jira
(v8.3.4#803005)