[jira] [Commented] (ARROW-13914) [C++][Python] Optimize type inference when converting from python values

Krisztian Szucs (Jira) Mon, 06 Sep 2021 07:44:07 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-13914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17410642#comment-17410642
 ]


Krisztian Szucs commented on ARROW-13914:
-----------------------------------------

Despite that {make_unions_} is always false I have the following results 
locally:

{code}
In [29]: %timeit pa.array(data)
1.31 ms ± 9.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [30]: %timeit pa.array(data, type=ty)
647 µs ± 9.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [31]: %timeit pa.infer_type(data)
669 µs ± 3.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
{code}

So the inference doubles the conversion time. 

> [C++][Python] Optimize type inference when converting from python values
> ------------------------------------------------------------------------
>
>                 Key: ARROW-13914
>                 URL: https://issues.apache.org/jira/browse/ARROW-13914
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Krisztian Szucs
>            Priority: Minor
>
> Currently we use an extensive set of checks to infer arrow type from python 
> sequences. 
> Last time I checked using asv, the inference part had a significant overhead. 
> We could try other approaches to speed-up the type inference, see comments: 
> https://github.com/apache/arrow/pull/11076#discussion_r702808196



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13914) [C++][Python] Optimize type inference when converting from python values

Reply via email to