[jira] [Commented] (ARROW-13914) [C++][Python] Optimize type inference when converting from python values

Antoine Pitrou (Jira) Mon, 06 Sep 2021 08:33:06 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-13914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17410683#comment-17410683
 ]


Antoine Pitrou commented on ARROW-13914:
----------------------------------------

We could, but in the end I'm not sure it would be very useful. Presumably, if 
you really care about performance you avoid generating your data in pure Python.

Converting from a Numpy array of integers is 100x faster than converting from a 
Python list of integers:
{code:python}
>>> d = list(range(10000))
>>> nd = np.array(d)
>>> %timeit pa.array(d)
288 µs ± 454 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit pa.array(d, type=pa.int64())
234 µs ± 3.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit pa.array(nd)
1.96 µs ± 6.64 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> %timeit pa.array(nd, type=pa.int64())
1.81 µs ± 6.24 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
{code}


> [C++][Python] Optimize type inference when converting from python values
> ------------------------------------------------------------------------
>
>                 Key: ARROW-13914
>                 URL: https://issues.apache.org/jira/browse/ARROW-13914
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Krisztian Szucs
>            Priority: Minor
>
> Currently we use an extensive set of checks to infer arrow type from python 
> sequences. 
> Last time I checked using asv, the inference part had a significant overhead. 
> We could try other approaches to speed-up the type inference, see comments: 
> https://github.com/apache/arrow/pull/11076#discussion_r702808196



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13914) [C++][Python] Optimize type inference when converting from python values

Reply via email to