hhhizzz commented on code in PR #9056:
URL: https://github.com/apache/arrow-rs/pull/9056#discussion_r2650017690
##########
parquet/src/arrow/arrow_reader/read_plan.rs:
##########
@@ -110,19 +110,13 @@ impl ReadPlanBuilder {
None => return RowSelectionStrategy::Selectors,
};
- let trimmed = selection.clone().trim();
- let selectors: Vec<RowSelector> = trimmed.into();
- if selectors.is_empty() {
- return RowSelectionStrategy::Mask;
- }
-
- let total_rows: usize = selectors.iter().map(|s|
s.row_count).sum();
- let selector_count = selectors.len();
- if selector_count == 0 {
+ let non_empty_selector_count = selection.iter().filter(|s|
s.row_count > 0).count();
Review Comment:
Good improvement! I didn’t notice this in my benchmarks, likely because the
dataset was too small.
Small suggestion:
- `non_empty_selector_count` is too long, `effective_count` could sound
better.
- Using `fold` to replace 2 seperate iteration might be better.
```
let (total_rows, effective_count) = selection.iter()
.fold((0, 0), |(rows, count), s| {
if s.row_count > 0 {
(rows + s.row_count, count + 1)
} else {
(rows, count)
}
});
if effective_count == 0 {
return RowSelectionStrategy::Mask;
}
if total_rows < effective_count.saturating_mul(threshold) {
RowSelectionStrategy::Mask
} else {
RowSelectionStrategy::Selectors
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]