alamb commented on code in PR #7850:
URL: https://github.com/apache/arrow-rs/pull/7850#discussion_r2208476062


##########
parquet/src/arrow/async_reader/mod.rs:
##########
@@ -1832,6 +1882,7 @@ mod tests {
         assert_eq!(total_rows, 730);
     }
 
+    #[ignore]

Review Comment:
   this test still fails for me locally when I remove the `ignore`



##########
parquet/src/arrow/async_reader/mod.rs:
##########
@@ -613,8 +623,18 @@ where
                     .fetch(&mut self.input, predicate.projection(), selection)
                     .await?;
 
+                let mut cache_projection = predicate.projection().clone();
+                cache_projection.intersect(&projection);

Review Comment:
   So one thing I didn't understand after reading this PR in detail was how the 
relative row positions are updated after applying a filter. 
   
   For example if we are applying multiple filters, the first may reduce the 
original RowSelection down to `[100->200]`, and now when the second filter runs 
it is only evaluated on the 100->200 rows , not the original selection
   
   In other words I think there needs to be some sort of function equvalent to 
`RowSelection::and_then` that applies to the cache
   
   ```rust
   // Narrow the cache so that it only retains the results of evaluating the 
predicate
   let row_group_cache = row_group_cache.and_then(resulting_selection)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to