alamb commented on issue #7456: URL: https://github.com/apache/arrow-rs/issues/7456#issuecomment-2872646158
@zhuqi-lucas has a great insight in https://github.com/apache/arrow-rs/pull/7454 -- namely that instead of a two pass algorithm (evaluate `RowFilter` to form a final `RowSelection` and then re-decode the filter) we can combine the filter application and decode steps (see https://github.com/apache/arrow-rs/pull/7454#pullrequestreview-2833094545) The current flow goes something like: 1. A set of array readers is created for the filter columns, and uses the provided RowSelection (this captures prunning pages ). 2. The decoded batches are used to evaluate the RowFilter / ArrowPredicates, which produces a `BooleanArray` bitmap 3. The "final" `RowSelection` is created, by `union`ing the existing `RowSelection` with the `BooleanArrays` 5. A new set of array readers is created with the updated `RowSelection` The current PR starts heading down a slightly modified flow, where the RowSelection and RowFilters are not combined. I think a combined solution would look something like: 1. Create Decoders for filter columns and projection (only) columns Decoding proceeds like: 1. read rows from initial `RowSelection` (reads a 8192 rows) from filter columns, if any 2. Apply any RowFilters on it (produces a BooleanArray) 3. repeat 1-2 until there are at least 8192 (batch size) rows that pass the filter. (This means we have `Vec<BooleanArray>` with 8192 1s and a Vec<Array> for each filter column that is also a projection column) 5. Then decode as maby RecordBatches from the projection (only) columns using the initial `RowSelection`) 6. Apply the filters to each array to form the final output batch (in projection columns) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org