XiangpengHao commented on issue #7363: URL: https://github.com/apache/arrow-rs/issues/7363#issuecomment-2770101880
> This is likely the same observation that lead [@XiangpengHao](https://github.com/XiangpengHao) to propose adding bitmask support in > > * [](https://github.com/apache/arrow-rs/pull/6624) Exactly, the predicate `SearchPhase <> ''` is not selective, meaning that we have lots of small selections: ``` select 2 skip 3 select 4 .... ``` Each [selector](https://github.com/apache/arrow-rs/blob/cf6e041cd11d98376c0f3632903d74f24f14529f/parquet/src/arrow/arrow_reader/selection.rs#L27) is 16 bytes, causing a lot of memory overhead. This also means that for each small select and skip, we will call the corresponding `read_records` and `skip_records`. These methods are currently not super optimized for millions of calls. The associated [`and_then`](https://github.com/apache/arrow-rs/blob/cf6e041cd11d98376c0f3632903d74f24f14529f/parquet/src/arrow/arrow_reader/selection.rs#L273) method is also expensive to evaluate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
