alamb commented on code in PR #9301:
URL: https://github.com/apache/arrow-rs/pull/9301#discussion_r2743184527
##########
parquet/src/arrow/push_decoder/reader_builder/mod.rs:
##########
@@ -437,6 +437,18 @@ impl RowGroupReaderBuilder {
.with_parquet_metadata(&self.metadata)
.build_array_reader(self.fields.as_deref(),
predicate.projection())?;
+ // Here is the spot where we could have previous mask
selection and start applying it to a col
+ // with a different pages layout
+ //
+ // Note that the selection could have come without filters and
applied manually
+ // We should keep another override_selector_strategy_if_needed
in waiting_on_data state
+
+ plan_builder = override_selector_strategy_if_needed(
+ plan_builder,
+ predicate.projection(),
+ self.row_group_offset_index(row_group_idx),
+ );
+
Review Comment:
I think this could be a slightly less confusing comment:
```suggestion
// Prepare to evaluate the filter.
// Note: first update the selection strategy to properly
handle any pages
// pruned during fetch
plan_builder = override_selector_strategy_if_needed(
plan_builder,
predicate.projection(),
self.row_group_offset_index(row_group_idx),
);
// `with_predicate` actually evaluates the filter
```
##########
parquet/src/arrow/push_decoder/reader_builder/mod.rs:
##########
@@ -437,6 +437,18 @@ impl RowGroupReaderBuilder {
.with_parquet_metadata(&self.metadata)
.build_array_reader(self.fields.as_deref(),
predicate.projection())?;
Review Comment:
This is a great fix -- and it makes total sense to me (basically that filter
evaluation is a different path, so we need to revert the selector strategy on
the filter path as well)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]