jorgecarleitao opened a new pull request #9301:
URL: https://github.com/apache/arrow/pull/9301


   This PR fixes two major issues in our `take` kernel for primitive arrays.
   
   Background
   
   * When using `values()` from an array, it is important to remember that only 
certain values are well defined (those whose the null bit is set / no buffer).
   
   * When reading values from an array using `array.value(i)`, it is important 
to remember that it currently performs no bound checks.
   
   * `take` offers an option to deactivate bound checks, by turning an error in 
a panic. that option defaults to not check (i.e. `panic`)
   
   however, `take` kernel respects none of the points above: 
   * it reads and uses undefined indices
   * it accesses out of bound values
   * it does not panic when `check_bounds = false` (it instead reads out of 
bounds)
   
   Specifically, it is currently doing something like the following:
   
   ```rust
   let indices = indices.values();
   for index in indices {
          let index = index.to_usize();
          let taken_value = values.value(index)
   }
   ```
   
   I.e. 
   
   * there is no check that `index < values.len()`. 
   * because `indices.values()` can contain arbitrary data on slot `x`, there 
is no guarantee that `x` results in a valid index.
   
   Combined, this allows for spectacular unbounded memory reads. E.g. it is 
possible that the index on the null slot reads from the offset `i64::MAX - 1`.
   
   This PR fixes this behavior. The implementation in this PR will likely be 
slower than the current implementation, but at least is (IMO) sound.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to