alamb opened a new issue, #10055:
URL: https://github.com/apache/arrow-rs/issues/10055

   ### Background
   
   PR #9755 added a `fused_inline_view_columns: FusedInlineViewColumns` field 
to `BatchCoalescer` (in `arrow-select/src/coalesce.rs`), computed once at 
construction and consulted on every filtered push to drive the fused inline 
Utf8View/BinaryView filter path:
   
   ```rust
   /// Inline view columns eligible for the fused filter path
   fused_inline_view_columns: FusedInlineViewColumns,
   ```
   
   ### What we'd like
   
   We would like to **remove this field if possible**. As noted in review, the 
`BatchCoalescer` ends up tracking per-type state even though the rest of the 
code already keeps per-type state in `InProgressArray`. See the original 
suggestion here:
   
   - https://github.com/apache/arrow-rs/pull/9755#discussion_r3342384374
   
   (quoted)
   
   > What do you think about modeling this as a separate type of 
`InProgressArray` -- for example instead of making some different paths for 
`InProgressByteViewArray` we could instead make `InProgressInlineByteViewArray` 
-- and then have a function that converts from `InProgressInlineByteViewArray` 
to `InProgressByteViewArray` if the input has any non inlined views
   >
   > I think that would make it harder to misuse the APIs
   
   ### Constraint to keep in mind
   
   The field was originally introduced as a performance optimization: the 
author measured that re-scanning each `RecordBatch`'s column types on every 
push cost ~5-6%, so any replacement should avoid reintroducing that per-batch 
overhead.
   
   This is a follow-on cleanup to #9755 and is not blocking.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to