[
https://issues.apache.org/jira/browse/ARROW-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney closed ARROW-3909.
-------------------------------
Resolution: Not A Problem
I'm a little slow today. The problem is materializing the {{RangeIndex}}
{code}
In [21]: timeit table = pa.Table.from_pandas(df, preserve_index=False)
283 µs ± 5.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
{code}
> [Python] Table.from_pandas call that seemingly should zero copy does not
> ------------------------------------------------------------------------
>
> Key: ARROW-3909
> URL: https://issues.apache.org/jira/browse/ARROW-3909
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Wes McKinney
> Priority: Major
> Fix For: 0.12.0
>
>
> While doing some performance testing, I noticed that a {{Table.from_pandas}}
> call that ought to be zero-copy / free was taking 50ms
> {code}
> import pandas as pd
> import pyarrow as pa
> import numpy as np
> K = 1000
> N = 50000000
> df = pd.DataFrame({'ints': np.tile(np.arange(K), N // K)})
> table = pa.Table.from_pandas(df)
> {code}
> I see
> {code}
> In [14]: timeit table = pa.Table.from_pandas(df)
> 51.9 ms ± 751 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
> {code}
> I haven't determined what's going on (is it counting nulls?), and initial
> attempts to get a Flamegraph produced a bunch of "unknown" entries
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)