[
https://issues.apache.org/jira/browse/ARROW-6107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898898#comment-16898898
]
Nick Poorman commented on ARROW-6107:
-------------------------------------
https://issues.apache.org/jira/browse/ARROW-4852 Is the same use case I'm
thinking of.
If you have an Arrow Table in C (or Python) and you want to access the data in
Go, you can pass a pointer back from C to the underlying data buffers. However,
you still have to collect all the metadata to utilize the buffers. Making CGO
calls is slow, so being able to pass a pointer to the data buffers and a
pointer to the serialized metadata would ensure a more constant time when
crossing the language boundary.
I did a simple POC to demonstrate what it would take to collect all the
information from Python and re-materialize it in Go.
[https://github.com/nickpoorman/go-py-arrow-bridge] The bottleneck is the
number of CGO calls required to fetch all the metadata.
> [Go] ipc.Writer Option to skip appending data buffers
> -----------------------------------------------------
>
> Key: ARROW-6107
> URL: https://issues.apache.org/jira/browse/ARROW-6107
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Go
> Reporter: Nick Poorman
> Priority: Minor
>
> For cases where we have a known shared memory region, it would be great if
> the ipc.Writer (and by extension ipc.Reader?) had the ability to write out
> everything but the actual buffers holding the data. That way we can still
> utilize the ipc mechanisms to communicate without having to serialize all the
> underlying data across the wire.
>
> This seems like it should be possible since the `RecordBatch` flatbuffers
> only contain the metadata and the underlying data buffers are appended later.
> We just need to skip appending the underlying data buffers.
>
> [~sbinet] thoughts?
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)