tustvold opened a new issue, #5036:
URL: https://github.com/apache/arrow-rs/issues/5036

   **Describe the bug**
   <!--
   A clear and concise description of what the bug is.
   -->
   
   An invariant of `RowSelection` is that it alternates select and skip, and 
doesn't not contain empty `RowSelector`.
   
   This is typically enforced when a RowSelection is created from a slice (or 
vec) of `RowSelector` by `from_selectors_and_combine`.
   
   When intersect_row_selections was imported from DataFusion in 
https://github.com/apache/arrow-rs/pull/3047 and subsequently exposed as a 
member function in 
https://github.com/apache/arrow-rs/pull/3084/files#diff-7638a63d118da0ac5321c1948eb9acfc59f7acee56598879eba8338b2c22ff9eR334
 a subtle bug was introduced.
   
   `intersect_row_selections` does not produce a `Vec<RowSelector>` that obey 
the invariants of `RowSelection`, and yet the member function doesn't call 
`from_selectors_and_combine`.
   
   This results in RowSelection of the form `[Skip(x), Skip(y)]`. The async 
reader determines what data to fetch based on what rows are selected, however, 
when reading the data it performs each operation in turn. In order to perform 
the first skip, the reader must set up the decoders to the relevant position 
within the pages (as it doesn't know that the next operation is another skip). 
This in turn causes it to request data that wasn't fetched, and the reader 
bails out with an offset index error.
   
   **To Reproduce**
   <!--
   Steps to reproduce the behavior:
   -->
   
   ```
   #[test]
   fn test_intersection() {
       let selection = RowSelection::from(vec![RowSelector::select(1048576)]);
       let result = selection.intersection(&selection);
       assert_eq!(result, selection);
   }
   ```
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to