[
https://issues.apache.org/jira/browse/KUDU-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Henke resolved KUDU-2077.
-------------------------------
Fix Version/s: 1.12.0
Assignee: Todd Lipcon
Resolution: Fixed
> Return data in Apache Arrow format
> ----------------------------------
>
> Key: KUDU-2077
> URL: https://issues.apache.org/jira/browse/KUDU-2077
> Project: Kudu
> Issue Type: New Feature
> Components: client, server
> Reporter: Andrew Wong
> Assignee: Todd Lipcon
> Priority: Major
> Labels: roadmap-candidate
> Fix For: 1.12.0
>
>
> Dan and I spent the hackathon tinkering with the Apache Arrow format. Arrow
> is an in-memory columnar format designed to be the common data format for a
> large number of projects, see [here|https://arrow.apache.org]. One place we
> thought adding this would be particularly fitting is when sending results
> back to the client, since this currently returns row-wise data. By returning
> Arrow, this could open the door to simpler and faster integration with other
> projects.
> The server-side changes can be localized to the tablet service and wire
> protocol. We considered using Arrow more exhaustively throughout the server
> codebase, but found that because Arrow and Kudu's own in-memory format (i.e.
> that in kudu::ColumnBlock) are so similar, a simpler approach is to copy the
> buffers from ColumnBlock to the scan response and build arrow::Arrays
> client-side. A POC of the server-side changes can be found here:
> https://github.com/danburkert/kudu/tree/arrow
> At the time of writing this, the arrow::Array type has a varying number of
> arrow::Buffers, depending on the data type (e.g. one for null bitmaps, one
> for data, etc). The ColumnBlock "buffers" (i.e. data, null_bitmap) should be
> compatible with these Buffers with a couple of modifications:
> * The null-bitmaps in arrow are the complement of those used by Kudu
> * The RowBlock that owns the ColumnBlocks has a selection vector needs to be
> accounted for
> If the buffers are transferred over the wire (via sidecars or protobuf), they
> should be able to be converted to Arrays via arrow::ArrayData or directly via
> the Array constructors.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)