Small bikeshed: But to keep naming consistent "ViewList"?
On Wed, Apr 26, 2023 at 8:02 AM Weston Pace <weston.p...@gmail.com> wrote: > > My understanding is that the primary benefit of this ListView layout > > over Arrow's existing List layouts [1] is that ListView allows for > > buffer alignment [2] without padding, which makes vectorized > > processing much more efficient. Is this understanding correct? > > Yes. Though proponents of list-view would probably point out that it > doesn't prevent you from having contiguous buffers, it simply doesn't > require it. > > > Unless I am missing something, I think the selection use-case > > could be equally well served by a dictionary-encoded > BinarArray/ListArray, > > and would have the benefit of not requiring any modifications to the > > existing format or kernels. > > This is a good point that did not come up in the previous discussion that I > can see. > > > The major additional flexibility of the proposed encoding would be > permitting disjoint > > or overlapping ranges, are these common enough in practice to represent a > meaningful bottleneck? > > I'm not sure. There was one other use case that was brought up in the > original discussion. This was that list view arrays can be constructed in > parallel. That is, if you know the output size (e.g. when applying a large > scalar function), then you can have different threads fill out different > regions of the offsets / lengths buffers. That being said, I don't know > for certain if anyone is relying on this behavior. > > > On Wed, Apr 26, 2023 at 7:12 AM Felipe Oliveira Carvalho < > felipe...@gmail.com> wrote: > > > After Weston's suggestion above, I've renamed files and classes in my WIP > > implementation: > > > > ArrayView -> ListView > > > > On Wed, Apr 26, 2023 at 11:08 AM Ian Cook <i...@ursacomputing.com> wrote: > > > > > +1 to what Weston and Joris suggested regarding the name. "ListView" > > > seems like the best name to use for this layout in Arrow. > > > > > > My understanding is that the primary benefit of this ListView layout > > > over Arrow's existing List layouts [1] is that ListView allows for > > > buffer alignment [2] without padding, which makes vectorized > > > processing much more efficient. Is this understanding correct? > > > > > > [1] > > > > > > https://arrow.apache.org/docs/format/Columnar.html#variable-size-list-layout > > > [2] > > > > > > https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding > > > > > > Ian > > > > > > On Wed, Apr 26, 2023 at 5:27 AM Joris Van den Bossche > > > <jorisvandenboss...@gmail.com> wrote: > > > > > > > > On Wed, 26 Apr 2023 at 02:37, Weston Pace <weston.p...@gmail.com> > > wrote: > > > > > > > > > > For context, there was some discussion on this back in [1]. At > that > > > time > > > > > this was called "sequence view" but I do not like that name. > > However, > > > > > array-view array is a little confusing. Given this is similar to > > list > > > can > > > > > we go with list-view array? > > > > > > > > Yes, given that this is essentially an alternative representation of > a > > > > logical "list" array, I would also prefer that we use the term "list" > > > > in the name for such a new type. The word "array" has a different > > > > meaning in context of our columnar specification. > > > > > >