[
https://issues.apache.org/jira/browse/ARROW-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704053#comment-16704053
]
Wes McKinney commented on ARROW-3909:
-------------------------------------
Note that no memory is being allocated. So this is weird
{code}
In [3]: table = pa.Table.from_pandas(df)
In [4]: pa.total_allocated_bytes()
Out[4]: 0
{code}
> [Python] Table.from_pandas call that seemingly should zero copy does not
> ------------------------------------------------------------------------
>
> Key: ARROW-3909
> URL: https://issues.apache.org/jira/browse/ARROW-3909
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Wes McKinney
> Priority: Major
> Fix For: 0.12.0
>
>
> While doing some performance testing, I noticed that a {{Table.from_pandas}}
> call that ought to be zero-copy / free was taking 50ms
> {code}
> import pandas as pd
> import pyarrow as pa
> import numpy as np
> K = 1000
> N = 50000000
> df = pd.DataFrame({'ints': np.tile(np.arange(K), N // K)})
> table = pa.Table.from_pandas(df)
> {code}
> I see
> {code}
> In [14]: timeit table = pa.Table.from_pandas(df)
> 51.9 ms ± 751 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
> {code}
> I haven't determined what's going on (is it counting nulls?), and initial
> attempts to get a Flamegraph produced a bunch of "unknown" entries
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)