[jira] [Commented] (ARROW-13690) [Python] Use IPC writing code for pickling RecordBatches

Joris Van den Bossche (Jira) Mon, 23 Aug 2021 04:44:05 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-13690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403137#comment-17403137
 ]


Joris Van den Bossche commented on ARROW-13690:
-----------------------------------------------

Somewhat related, using IPC for pickling would also help for ensuring we don't 
pickle the full buffer of a sliced array -> ARROW-10739 (I don't know if there 
are significant downsides in always using IPC? I suppose for simple/small 
arrays it will give some overhead for Arrays, since we need to put those in a 
RecordBatch to use the IPC machinery?)

> [Python] Use IPC writing code for pickling RecordBatches
> --------------------------------------------------------
>
>                 Key: ARROW-13690
>                 URL: https://issues.apache.org/jira/browse/ARROW-13690
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Micah Kornfield
>            Priority: Major
>
> For wide schemas in particular the the recursive nature of the currently 
> pickling algorithm for record batches makes it less efficient then using the 
> IPC format (which can be done entirely in C++).
>  
> Consider switching the mechanism to use the IPC format.  I think this can be 
> a backwards compatible change if the current leaving: 
> _reconstruct_record_batch in place if we care about that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13690) [Python] Use IPC writing code for pickling RecordBatches

Reply via email to