[
https://issues.apache.org/jira/browse/ARROW-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089902#comment-17089902
]
David Li commented on ARROW-5377:
---------------------------------
Looking at this a bit more, would an API roughly like the following work?
{code:cpp}
struct IpcPayloadSequence; // Internally
std::vector<std::shared_ptr<IpcPayload>>
Result<std::shared_ptr<IpcPayloadSequence>> RecordBatchToPayload(const
RecordBatch&);
Result<std::shared_ptr<IpcPayloadSequence>> SchemaToPayload(const Schema&);
int64_t GetPayloadSize(const IpcPayloadSequence&);
class RecordBatchWriter {
Status WritePayloads(const IpcPayloadSequence&);
};
{code}
This avoids exposing IpcPayload directly. (There probably needs to be more
state to help RecordBatchWriter know whether the payload is a schema, has all
the necessary dictionary batches, etc.)
(For context, this would be useful to help limit the on-network message sizes
in Flight, to avoid tripping limits in gRPC and help manage memory usage by
Netty/gRPC.)
> [C++] Develop interface for writing a RecordBatch IPC stream into
> pre-allocated space (e.g. memory map) that avoids unnecessary serialization
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-5377
> URL: https://issues.apache.org/jira/browse/ARROW-5377
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Wes McKinney
> Priority: Major
>
> As discussed in recent mailing list thread
> https://lists.apache.org/thread.html/b756209052fecb8c28a5eb37db7aecb82a5f5351fa79a9d86f0dba3e@%3Cuser.arrow.apache.org%3E
> The only viable process at the moment for getting an accurate report of
> stream size is to write a simulated stream using {{MockOutputStream}}. This
> is suboptimal for a couple of reasons:
> * Flatbuffers metadata must be created twice
> * Record batch disassembly into IpcPayload must be performed twice
> It seems like an interface with a very constrained public API could be
> provided to deconstruct a sequence of RecordBatches and report the size of
> the produced IPC stream (based on metadata sizes, and padding), and then this
> deconstructed set of IPC payloads can be written out to a stream (e.g. using
> {{FixedSizeBufferWriter}})
--
This message was sent by Atlassian Jira
(v8.3.4#803005)