[ 
https://issues.apache.org/jira/browse/ARROW-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-3909.
-------------------------------
    Resolution: Not A Problem

I'm a little slow today. The problem is materializing the {{RangeIndex}}

{code}
In [21]: timeit table = pa.Table.from_pandas(df, preserve_index=False)
283 µs ± 5.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
{code}

> [Python] Table.from_pandas call that seemingly should zero copy does not
> ------------------------------------------------------------------------
>
>                 Key: ARROW-3909
>                 URL: https://issues.apache.org/jira/browse/ARROW-3909
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 0.12.0
>
>
> While doing some performance testing, I noticed that a {{Table.from_pandas}} 
> call that ought to be zero-copy / free was taking 50ms
> {code}
> import pandas as pd
> import pyarrow as pa
> import numpy as np
> K = 1000
> N = 50000000
> df = pd.DataFrame({'ints': np.tile(np.arange(K), N // K)})
> table = pa.Table.from_pandas(df)
> {code}
> I see
> {code}
> In [14]: timeit table = pa.Table.from_pandas(df)
> 51.9 ms ± 751 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
> {code}
> I haven't determined what's going on (is it counting nulls?), and initial 
> attempts to get a Flamegraph produced a bunch of "unknown" entries



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to