[
https://issues.apache.org/jira/browse/ARROW-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338164#comment-16338164
]
ASF GitHub Bot commented on ARROW-1860:
---------------------------------------
wesm opened a new pull request #1500: ARROW-1860: [C++] Introduce
ipc::PreparedMessage data structure to avoid making multiple passes over record
batches
URL: https://github.com/apache/arrow/pull/1500
The purpose of this is to decompose a record batch into its serialized
Flatbuffer metadata and sequence of buffers that form its body so that the
total output size can be computed, e.g. for writing to a shared memory segment.
Prior to this, one would have to call `GetRecordBatchSize`, allocate memory,
then `WriteRecordBatch` which duplicates work.
This also introduces a change to how padding is handled in unaligned streams
to make the message contents deterministic. This makes record batches
consistent with the way that tensors were already being written, with alignment
bytes being written first to move the stream position to a multiple of 8, then
beginning to write the metadata and message body.
The unaligned stream case isn't being handled consistently on the read path
yet, so I'll fix that and add a test.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [C++] Add data structure to "stage" a sequence of IPC messages from in-memory
> data
> ----------------------------------------------------------------------------------
>
> Key: ARROW-1860
> URL: https://issues.apache.org/jira/browse/ARROW-1860
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Wes McKinney
> Assignee: Wes McKinney
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: text.html
>
>
> Currently, when you need to pre-allocate space for a record batch or a stream
> (schema + dictionaries + record batches), you must make multiple passes over
> the data structures of interest (and use e.g. {{MockOutputStream}} to compute
> the size of the output buffer). It would be useful to make a single pass to
> "prepare" the IPC payload for both sizing and writing to prevent having to
> make multiple passes
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)