alamb opened a new issue, #10055: URL: https://github.com/apache/arrow-rs/issues/10055
### Background PR #9755 added a `fused_inline_view_columns: FusedInlineViewColumns` field to `BatchCoalescer` (in `arrow-select/src/coalesce.rs`), computed once at construction and consulted on every filtered push to drive the fused inline Utf8View/BinaryView filter path: ```rust /// Inline view columns eligible for the fused filter path fused_inline_view_columns: FusedInlineViewColumns, ``` ### What we'd like We would like to **remove this field if possible**. As noted in review, the `BatchCoalescer` ends up tracking per-type state even though the rest of the code already keeps per-type state in `InProgressArray`. See the original suggestion here: - https://github.com/apache/arrow-rs/pull/9755#discussion_r3342384374 (quoted) > What do you think about modeling this as a separate type of `InProgressArray` -- for example instead of making some different paths for `InProgressByteViewArray` we could instead make `InProgressInlineByteViewArray` -- and then have a function that converts from `InProgressInlineByteViewArray` to `InProgressByteViewArray` if the input has any non inlined views > > I think that would make it harder to misuse the APIs ### Constraint to keep in mind The field was originally introduced as a performance optimization: the author measured that re-scanning each `RecordBatch`'s column types on every push cost ~5-6%, so any replacement should avoid reintroducing that per-batch overhead. This is a follow-on cleanup to #9755 and is not blocking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
