[jira] [Commented] (ARROW-1784) [Python] Read and write pandas.DataFrame in pyarrow.serialize by decomposing the BlockManager rather than coercing to Arrow format

Wes McKinney (JIRA) Thu, 09 Nov 2017 11:12:27 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246309#comment-16246309
 ]


Wes McKinney commented on ARROW-1784:
-------------------------------------

It's hard to prevent a memory doubling on receipt if you go column-wise (e.g. 
{{pd.DataFrame(data)}} where data is a dict of columns will double memory). So 
I think as long as we avoid memory doubling we are good

> [Python] Read and write pandas.DataFrame in pyarrow.serialize by decomposing 
> the BlockManager rather than coercing to Arrow format
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-1784
>                 URL: https://issues.apache.org/jira/browse/ARROW-1784
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Wes McKinney
>             Fix For: 0.8.0
>
>
> See discussion in https://github.com/dask/distributed/pull/931
> This will permit zero-copy reads for DataFrames not containing Python 
> objects. In the event of an {{ObjectBlock}} these arrays will not be worse 
> than pickle to reconstruct on the receiving side



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ARROW-1784) [Python] Read and write pandas.DataFrame in pyarrow.serialize by decomposing the BlockManager rather than coercing to Arrow format

Reply via email to