alamb opened a new issue, #7458:
URL: https://github.com/apache/arrow-rs/issues/7458

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   This ticket records the symptoms reported by @mbutrovich in 
([discord](https://discord.com/channels/885562378132000778/1363995762182193373/1363995990935212113))
 where they see inconsistent performance. It appears the root cause is 
allocations related to computing the RowSelection to evaluate multiple 
predicates:
   
   
   > In our case it's currently RowSelection::and_then, so I'm trying to make 
sense of that function and see if there's a more efficient way to go about it 
other than the iter().cloned() over both inputs, mutating those,  and building 
the output one element at a time
   > 
   > i was wondering about the better representation of Vec<RowSelector>
   > 
   > I'm coming at Rust from C and C++, and a struct with a uint64 and a bool 
stuck on teh end is just gonna end up aligned to 64 bits with a bunch of 
padding on the end between each one. Is Rust going to do something similar?
   
   Background: 
   
   `RowSelection::and_then` is used to combine the results of multiple 
ArrowPredicates in a 
[RowFilter](https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.RowFilter.html)
 -- see 
[source](https://github.com/apache/arrow-rs/blob/959499bbb58e10e2eb8cf8f54eb9215d4e9d1fef/parquet/src/arrow/arrow_reader/mod.rs#L999-L1002):
   
   Here is the 
[code](https://github.com/apache/arrow-rs/blob/474f1924fff30d3150f7c737205bb9f903686d53/parquet/src/arrow/arrow_reader/selection.rs#L273-L272)
 for `RowSelection::and_then`.
   
   
   **Describe the solution you'd like**
   I would like the combination of multiple `RowSelection`s to go faster
   
   **Describe alternatives you've considered**
   Some suggestions from @Dandandan in discord:
   > selectors can reduce allocations in from log(N) to 1 allocations using 
Vec::with_capacity(len_left + len_right)
   > Alternatively: the self.selectors allocation probably could be reused for 
the new one
   > Any better way to represent Vec<RowSelector> ?
   
   Here is one idea for better representing `RowSelection` instead of 
`Vec<RowSelector>`
   - https://github.com/apache/arrow-rs/issues/7450
   
   
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to