[ 
https://issues.apache.org/jira/browse/ARROW-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277936#comment-16277936
 ] 

ASF GitHub Bot commented on ARROW-1784:
---------------------------------------

mrocklin commented on issue #1390: ARROW-1784: [Python] Enable zero-copy 
serialization, deserialization of pandas.DataFrame via components
URL: https://github.com/apache/arrow/pull/1390#issuecomment-349180433
 
 
   Thank you for putting this together.  I look forward to trying this out with 
Dask and seeing if it relieves the memory pressure we're seeing when sending 
dataframes.  What does the current dev-build process look like?  I think I read 
that you all had set up nightly builds on the twosigma channel?
   
   > The impact of this is that when a DataFrame has no data that requires 
pickling, the reconstruction is zero-copy. I will post some benchmarks to 
illustrate the impact of this. The performance improvements are pretty 
remarkable, nearly 1000x speedup on a large DataFrame.
   
   This is to be expected, right?  
   
   > serialize with Arrow table as intermediary: 1.64s in, 1.44s out
   > serialize using pickle: 623ms in, 489ms out
   > serialize using component method: 554ms in, 408ms out
   
   That's surprisingly nice.  Do you have a sense for what is going on here?  
100ms in copying memory?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> [Python] Read and write pandas.DataFrame in pyarrow.serialize by decomposing 
> the BlockManager rather than coercing to Arrow format
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-1784
>                 URL: https://issues.apache.org/jira/browse/ARROW-1784
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Wes McKinney
>            Assignee: Wes McKinney
>              Labels: pull-request-available
>             Fix For: 0.8.0
>
>
> See discussion in https://github.com/dask/distributed/pull/931
> This will permit zero-copy reads for DataFrames not containing Python 
> objects. In the event of an {{ObjectBlock}} these arrays will not be worse 
> than pickle to reconstruct on the receiving side



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to