Hi folks,

I would like to start a public discussion on the inclusion of a new array
format to Arrow — array-view array. The name is also up for debate.

This format is inspired by Velox's ArrayVector format [1]. Logically, this
array represents an array of arrays. Each element is an array-view (offset
and size pair) that points to a range within a nested "values" array
(called "elements" in Velox docs). The nested array can be of any type,
which makes this format very flexible and powerful.

[image: ../_images/array-vector.png]
<https://facebookincubator.github.io/velox/_images/array-vector.png>

I'm currently working on a C++ implementation and plan to work on a Go
implementation to fulfill the two-implementations requirement for format
changes.

The draft design:

- 3 buffers: [validity_bitmap, int32 offsets buffer, int32 sizes buffer]
- 1 child array: "values" as an array of the type parameter

validity_bitmap is used to differentiate between empty array views
(sizes[i] == 0) and NULL array views (validity_bitmap[i] == 0).

When the validity_bitmap[i] is 0, both sizes and offsets are undefined (as
usual), and when sizes[i] == 0, offsets[i] is undefined. 0 is recommended
if setting a value is not an issue to the system producing the arrays.

offsets buffer is not required to be ordered and views don't have to be
disjoint.

[1]
https://facebookincubator.github.io/velox/develop/vectors.html#arrayvector

Thanks,
Felipe O. Carvalho

Reply via email to