[ 
https://issues.apache.org/jira/browse/ARROW-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089902#comment-17089902
 ] 

David Li commented on ARROW-5377:
---------------------------------

Looking at this a bit more, would an API roughly like the following work?
{code:cpp}
struct IpcPayloadSequence; // Internally 
std::vector<std::shared_ptr<IpcPayload>>
Result<std::shared_ptr<IpcPayloadSequence>> RecordBatchToPayload(const 
RecordBatch&);
Result<std::shared_ptr<IpcPayloadSequence>> SchemaToPayload(const Schema&);
int64_t GetPayloadSize(const IpcPayloadSequence&);

class RecordBatchWriter {
  Status WritePayloads(const IpcPayloadSequence&);
};
{code}
This avoids exposing IpcPayload directly. (There probably needs to be more 
state to help RecordBatchWriter know whether the payload is a schema, has all 
the necessary dictionary batches, etc.)

(For context, this would be useful to help limit the on-network message sizes 
in Flight, to avoid tripping limits in gRPC and help manage memory usage by 
Netty/gRPC.)

> [C++] Develop interface for writing a RecordBatch IPC stream into 
> pre-allocated space (e.g. memory map) that avoids unnecessary serialization
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-5377
>                 URL: https://issues.apache.org/jira/browse/ARROW-5377
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>
> As discussed in recent mailing list thread
> https://lists.apache.org/thread.html/b756209052fecb8c28a5eb37db7aecb82a5f5351fa79a9d86f0dba3e@%3Cuser.arrow.apache.org%3E
> The only viable process at the moment for getting an accurate report of 
> stream size is to write a simulated stream using {{MockOutputStream}}. This 
> is suboptimal for a couple of reasons:
> * Flatbuffers metadata must be created twice
> * Record batch disassembly into IpcPayload must be performed twice
> It seems like an interface with a very constrained public API could be 
> provided to deconstruct a sequence of RecordBatches and report the size of 
> the produced IPC stream (based on metadata sizes, and padding), and then this 
> deconstructed set of IPC payloads can be written out to a stream (e.g. using 
> {{FixedSizeBufferWriter}})



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to