H G created ARROW-9623:
--------------------------

             Summary: Performance difference between pc.multiply vs pd.multiply
                 Key: ARROW-9623
                 URL: https://issues.apache.org/jira/browse/ARROW-9623
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
    Affects Versions: 1.0.0
         Environment: Windows
Pyarrow 1.0.0
            Reporter: H G


Wanted to report the performance difference observed between Pandas and Pyarrow.

```
import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.compute as pc

df = pd.DataFrame(np.random.randn(100000000))
%timeit -n 5 -r 5 df.multiply(df)

table = pa.Table.from_pandas(df)
%timeit -n 5 -r 5 pc.multiply(table[0],table[0])
```

Results:
```
%timeit -n 5 -r 5 df.multiply(df)
374 ms ± 15.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
```

```
%timeit -n 5 -r 5 pc.multiply(table[0],table[0])
698 ms ± 297 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to