[
https://issues.apache.org/jira/browse/ARROW-16289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526627#comment-17526627
]
Weston Pace commented on ARROW-16289:
-------------------------------------
CC [~lidavidm] [~edponce] [~apitrou] [~michalno] [~yibocai]
> [C++] (eventually) abandon scalar columns of an ExecBatch in favor of RLE
> encoded arrays
> ----------------------------------------------------------------------------------------
>
> Key: ARROW-16289
> URL: https://issues.apache.org/jira/browse/ARROW-16289
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Priority: Major
>
> This JIRA is a proposal / discussion. I am not asserting this is the way to
> go but I would like to consider it.
> From the execution engine's perspective an exec batch's columns are always
> either arrays or scalars. The only time we make use of scalars today is for
> the four augmented columns (e.g. __filename). Once we have support for RLE
> arrays a scalar could easily be encoded as an RLE array and there would be no
> need to use scalars here.
> The advantage would be reducing the complexity in exec nodes and avoiding
> issues like ARROW-16288. It is already rather difficult to explain the idea
> of a "scalar" and "vector" function and then have to turn around and explain
> that the word "scalar" has an entirely different meaning when talking about
> field shape.
> I think it's worth considering taking this even further and removing the
> concept from the compute layer entirely. Kernel functions that want to have
> special logic for scalars could do so using the RLE array. This would be a
> significant change to many kernels which currently declare the ANY shape and
> determine which logic to apply within the kernel itself (e.g. there is one
> array OR scalar kernel and not one kernel for each).
> Admittedly there is probably a few instructions and a few bytes more to
> handle an RLE scalar than the scalar we have today. However, this is just
> different flavors of O(1) and not likely to have significant impact.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)