ASF GitHub Bot commented on ARROW-2121:

robertnishihara commented on issue #1581: ARROW-2121: [Python] Handle object 
arrays directly in pandas serializer.
URL: https://github.com/apache/arrow/pull/1581#issuecomment-364573786
   Some performance numbers. The numbers are somewhat variable if you run the 
benchmarks multiple times.
   import pyarrow as pa
   import pandas as pd
   df = pd.DataFrame(data={str(i): [i, str(i)] for i in range(10 ** 6)})
   Before this PR
   context = pa.pandas_serialization_context()
   %time s = pa.serialize(df, context=context).to_buffer()  # 570ms
   %time d = pa.deserialize(s, context=context)  # 485ms
   %timeit s = pa.serialize(df, context=context).to_buffer()  # 482ms
   %timeit d = pa.deserialize(s, context=context)  # 376ms
   After this PR
   %time s = pa.serialize(df).to_buffer()  # 577ms
   %time d = pa.deserialize(s)  # 672ms
   %timeit s = pa.serialize(df).to_buffer()  # 467ms
   %timeit d = pa.deserialize(s)  # 349ms

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> Consider special casing object arrays in pandas serializers.
> ------------------------------------------------------------
>                 Key: ARROW-2121
>                 URL: https://issues.apache.org/jira/browse/ARROW-2121
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Robert Nishihara
>            Priority: Major
>              Labels: pull-request-available

This message was sent by Atlassian JIRA

Reply via email to