[ https://issues.apache.org/jira/browse/ARROW-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
H G updated ARROW-9623: ----------------------- Description: Wanted to report the performance difference observed between Pandas and Pyarrow. {code:java} // import numpy as np import pandas as pd import pyarrow as pa import pyarrow.compute as pc df = pd.DataFrame(np.random.randn(100000000)) %timeit -n 5 -r 5 df.multiply(df) table = pa.Table.from_pandas(df) %timeit -n 5 -r 5 pc.multiply(table[0],table[0]) {code} Results: {code:java} %timeit -n 5 -r 5 df.multiply(df) 374 ms ± 15.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)``{code} {code:java} %timeit -n 5 -r 5 pc.multiply(table[0],table[0]) 698 ms ± 297 ms per loop (mean ± std. dev. of 5 runs, 5 loops each){code} was: Wanted to report the performance difference observed between Pandas and Pyarrow. ``` import numpy as np import pandas as pd import pyarrow as pa import pyarrow.compute as pc df = pd.DataFrame(np.random.randn(100000000)) %timeit -n 5 -r 5 df.multiply(df) table = pa.Table.from_pandas(df) %timeit -n 5 -r 5 pc.multiply(table[0],table[0]) ``` Results: ``` %timeit -n 5 -r 5 df.multiply(df) 374 ms ± 15.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each) ``` ``` %timeit -n 5 -r 5 pc.multiply(table[0],table[0]) 698 ms ± 297 ms per loop (mean ± std. dev. of 5 runs, 5 loops each) ``` Summary: [Python] Performance difference between pc.multiply vs pd.multiply (was: Performance difference between pc.multiply vs pd.multiply) > [Python] Performance difference between pc.multiply vs pd.multiply > ------------------------------------------------------------------ > > Key: ARROW-9623 > URL: https://issues.apache.org/jira/browse/ARROW-9623 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Affects Versions: 1.0.0 > Environment: Windows > Pyarrow 1.0.0 > Reporter: H G > Priority: Minor > > Wanted to report the performance difference observed between Pandas and > Pyarrow. > > {code:java} > // import numpy as np > import pandas as pd > import pyarrow as pa > import pyarrow.compute as pc > df = pd.DataFrame(np.random.randn(100000000)) > %timeit -n 5 -r 5 df.multiply(df) > table = pa.Table.from_pandas(df) > %timeit -n 5 -r 5 pc.multiply(table[0],table[0]) > {code} > Results: > {code:java} > %timeit -n 5 -r 5 df.multiply(df) > 374 ms ± 15.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)``{code} > > {code:java} > %timeit -n 5 -r 5 pc.multiply(table[0],table[0]) > 698 ms ± 297 ms per loop (mean ± std. dev. of 5 runs, 5 loops each){code} -- This message was sent by Atlassian Jira (v8.3.4#803005)