[jira] [Commented] (ARROW-1860) [C++] Add data structure to "stage" a sequence of IPC messages from in-memory data

ASF GitHub Bot (JIRA) Wed, 24 Jan 2018 12:24:13 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338164#comment-16338164
 ]


ASF GitHub Bot commented on ARROW-1860:
---------------------------------------

wesm opened a new pull request #1500: ARROW-1860: [C++] Introduce 
ipc::PreparedMessage data structure to avoid making multiple passes over record 
batches
URL: https://github.com/apache/arrow/pull/1500
 
 
   The purpose of this is to decompose a record batch into its serialized 
Flatbuffer metadata and sequence of buffers that form its body so that the 
total output size can be computed, e.g. for writing to a shared memory segment. 
Prior to this, one would have to call `GetRecordBatchSize`, allocate memory, 
then `WriteRecordBatch` which duplicates work. 
   
   This also introduces a change to how padding is handled in unaligned streams 
to make the message contents deterministic. This makes record batches 
consistent with the way that tensors were already being written, with alignment 
bytes being written first to move the stream position to a multiple of 8, then 
beginning to write the metadata and message body.
   
   The unaligned stream case isn't being handled consistently on the read path 
yet, so I'll fix that and add a test. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> [C++] Add data structure to "stage" a sequence of IPC messages from in-memory 
> data
> ----------------------------------------------------------------------------------
>
>                 Key: ARROW-1860
>                 URL: https://issues.apache.org/jira/browse/ARROW-1860
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Wes McKinney
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>         Attachments: text.html
>
>
> Currently, when you need to pre-allocate space for a record batch or a stream 
> (schema + dictionaries + record batches), you must make multiple passes over 
> the data structures of interest (and use e.g. {{MockOutputStream}} to compute 
> the size of the output buffer). It would be useful to make a single pass to 
> "prepare" the IPC payload for both sizing and writing to prevent having to 
> make multiple passes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1860) [C++] Add data structure to "stage" a sequence of IPC messages from in-memory data

Reply via email to