[
https://issues.apache.org/jira/browse/ARROW-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170112#comment-17170112
]
Wes McKinney commented on ARROW-9623:
-------------------------------------
My guess is that it's because NumPy does runtime AVX2 dispatch. I made an AVX2
build of pyarrow and I see no performance difference
{code}
In [5]: arr = np.random.randn(100000000)
In [6]: timeit arr * arr
87.3 ms ± 813 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [7]: pa_arr = pa.array(arr)
In [8]: timeit pc.multiply(pa_arr, pa_arr)
87.5 ms ± 5.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
{code}
> [Python] Performance difference between pc.multiply vs pd.multiply
> ------------------------------------------------------------------
>
> Key: ARROW-9623
> URL: https://issues.apache.org/jira/browse/ARROW-9623
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Affects Versions: 1.0.0
> Environment: Windows
> Pyarrow 1.0.0
> Reporter: H G
> Priority: Minor
>
> Wanted to report the performance difference observed between Pandas and
> Pyarrow.
>
> {code:java}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> import pyarrow.compute as pc
> df = pd.DataFrame(np.random.randn(100000000))
> %timeit -n 5 -r 5 df.multiply(df)
> table = pa.Table.from_pandas(df)
> %timeit -n 5 -r 5 pc.multiply(table[0],table[0])
> {code}
> Results:
> {code:java}
> %timeit -n 5 -r 5 df.multiply(df)
> 374 ms ± 15.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)``{code}
>
> {code:java}
> %timeit -n 5 -r 5 pc.multiply(table[0],table[0])
> 698 ms ± 297 ms per loop (mean ± std. dev. of 5 runs, 5 loops each){code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)