My apologies, I did not see the  thread [1] for some reason

[1] https://lists.apache.org/thread/r28rw5n39jwtvn08oljl09d4q2c1ysvb

On Thu, Apr 27, 2023 at 10:32 AM Andrew Lamb <al...@influxdata.com> wrote:

> Felipe, thank you for bringing this up.
>
> Another approach that is sometimes used in database engines (like DuckDB)
> and is often called selection vectors, is to store another bitmask that
> says which elements in the array should be "selected" and which are ignored
> and functions like a view.
>
> For example, a  selection vector {0, 1, 1, 0, 1} would represent a view of
> the second and third and fifth rows
>
> I think the selection vector is as general as the ArrayVector format you
> describe, and likely simpler to implement (especially in compute kernels).
> The downside is that for very sparse selections on very large arrays, the
> size of the selection vector may be larger than the array view
>
> Have you considered such an approach?
>
> Andrew
>
> On Wed, Apr 26, 2023 at 1:27 AM wish maple <maplewish...@gmail.com> wrote:
>
>> I think the ArrayVector can have benefits above:
>> 1. Converting a Batch in Velox or other system to arrow array could be
>> much
>>     more lightweight.
>> 2. Modifying, filter and copy array or string could be much more
>> lightweight
>>
>> Velox can make a Vector mutable, seems that arrow array cannot. Seems it
>> makes little difference here.
>>
>> On 2023/04/25 22:00:08 Felipe Oliveira Carvalho wrote:
>> > Hi folks,
>> >
>> > I would like to start a public discussion on the inclusion of a new
>> array
>> > format to Arrow — array-view array. The name is also up for debate.
>> >
>> > This format is inspired by Velox's ArrayVector format [1]. Logically,
>> this
>> > array represents an array of arrays. Each element is an array-view
>> (offset
>> > and size pair) that points to a range within a nested "values" array
>> > (called "elements" in Velox docs). The nested array can be of any type,
>> > which makes this format very flexible and powerful.
>> >
>> > [image: ../_images/array-vector.png]
>> > <https://facebookincubator.github.io/velox/_images/array-vector.png>
>> >
>> > I'm currently working on a C++ implementation and plan to work on a Go
>> > implementation to fulfill the two-implementations requirement for format
>> > changes.
>> >
>> > The draft design:
>> >
>> > - 3 buffers: [validity_bitmap, int32 offsets buffer, int32 sizes buffer]
>> > - 1 child array: "values" as an array of the type parameter
>> >
>> > validity_bitmap is used to differentiate between empty array views
>> > (sizes[i] == 0) and NULL array views (validity_bitmap[i] == 0).
>> >
>> > When the validity_bitmap[i] is 0, both sizes and offsets are undefined
>> (as
>> > usual), and when sizes[i] == 0, offsets[i] is undefined. 0 is
>> recommended
>> > if setting a value is not an issue to the system producing the arrays.
>> >
>> > offsets buffer is not required to be ordered and views don't have to be
>> > disjoint.
>> >
>> > [1]
>> >
>> https://facebookincubator.github.io/velox/develop/vectors.html#arrayvector
>> >
>> > Thanks,
>> > Felipe O. Carvalho
>> >
>>
>

Reply via email to