My apologies, I did not see the thread [1] for some reason [1] https://lists.apache.org/thread/r28rw5n39jwtvn08oljl09d4q2c1ysvb
On Thu, Apr 27, 2023 at 10:32 AM Andrew Lamb <al...@influxdata.com> wrote: > Felipe, thank you for bringing this up. > > Another approach that is sometimes used in database engines (like DuckDB) > and is often called selection vectors, is to store another bitmask that > says which elements in the array should be "selected" and which are ignored > and functions like a view. > > For example, a selection vector {0, 1, 1, 0, 1} would represent a view of > the second and third and fifth rows > > I think the selection vector is as general as the ArrayVector format you > describe, and likely simpler to implement (especially in compute kernels). > The downside is that for very sparse selections on very large arrays, the > size of the selection vector may be larger than the array view > > Have you considered such an approach? > > Andrew > > On Wed, Apr 26, 2023 at 1:27 AM wish maple <maplewish...@gmail.com> wrote: > >> I think the ArrayVector can have benefits above: >> 1. Converting a Batch in Velox or other system to arrow array could be >> much >> more lightweight. >> 2. Modifying, filter and copy array or string could be much more >> lightweight >> >> Velox can make a Vector mutable, seems that arrow array cannot. Seems it >> makes little difference here. >> >> On 2023/04/25 22:00:08 Felipe Oliveira Carvalho wrote: >> > Hi folks, >> > >> > I would like to start a public discussion on the inclusion of a new >> array >> > format to Arrow — array-view array. The name is also up for debate. >> > >> > This format is inspired by Velox's ArrayVector format [1]. Logically, >> this >> > array represents an array of arrays. Each element is an array-view >> (offset >> > and size pair) that points to a range within a nested "values" array >> > (called "elements" in Velox docs). The nested array can be of any type, >> > which makes this format very flexible and powerful. >> > >> > [image: ../_images/array-vector.png] >> > <https://facebookincubator.github.io/velox/_images/array-vector.png> >> > >> > I'm currently working on a C++ implementation and plan to work on a Go >> > implementation to fulfill the two-implementations requirement for format >> > changes. >> > >> > The draft design: >> > >> > - 3 buffers: [validity_bitmap, int32 offsets buffer, int32 sizes buffer] >> > - 1 child array: "values" as an array of the type parameter >> > >> > validity_bitmap is used to differentiate between empty array views >> > (sizes[i] == 0) and NULL array views (validity_bitmap[i] == 0). >> > >> > When the validity_bitmap[i] is 0, both sizes and offsets are undefined >> (as >> > usual), and when sizes[i] == 0, offsets[i] is undefined. 0 is >> recommended >> > if setting a value is not an issue to the system producing the arrays. >> > >> > offsets buffer is not required to be ordered and views don't have to be >> > disjoint. >> > >> > [1] >> > >> https://facebookincubator.github.io/velox/develop/vectors.html#arrayvector >> > >> > Thanks, >> > Felipe O. Carvalho >> > >> >