alamb opened a new issue #817:
URL: https://github.com/apache/arrow-rs/issues/817


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   This ticket lists a high level plan to address one of the main sources of 
security issues in arrow-rs
   such as https://github.com/apache/arrow-rs/issues/772 and likely several 
others on 
https://github.com/apache/arrow-rs/issues?q=is%3Aissue+is%3Aopen+label%3Asecurity
   
   As demonstrated in https://github.com/jorgecarleitao/arrow2#why, and almost 
all of the examples in 
https://github.com/apache/arrow-rs/issues?q=is%3Aissue+is%3Aopen+label%3Asecurity,
 creating `ArrayData::new` with invalid arguments can lead to undefined 
behavior.
   
   See also the discussion with @jhorstmann  and others on 
https://lists.apache.org/thread.html/r3f12f3352ca36264622d4103fcb6c7c71544dcaf0f0a7e842f00c3a0%40%3Cdev.arrow.apache.org%3E
   
   **Describe the solution you'd like**
   I propose to follow the C++ implementation (kudos to @pitrou) in 
https://github.com/apache/arrow/blob/b73af9a1607caa4a04e1a11896aed6669847a4d4/cpp/src/arrow/array/validate.cc#L388-L392
   
   Add two new functions:
   1. `ArrayData::validate()` -- checks offsets / buffer sizes, relatively 
inexpensive
   2. `ArrayData::validate_full` --  which calls`validate()` **AND** checks all 
variable length data structures for consistency (e.g. ensures the offsets of a 
StringArray are within the size of the base array
   
   Then, change `ArrayData` to have two constructors:
   1. `unsafe ArrayData::new_unchecked()` - Behaves like `ArrayData::new()` 
does today -- namely has no validation 
   2. `ArrayData::new()` will be safe in the Rust sense -- can not cause 
undefined behavior and thus will call `ArrayData::validate_full`
   
   This design will follow the Rust philosophy of "safe by default" but offer 
an alternative (`unsafe`) mechanism to bypass checking for known good inputs. 
This `unsafe` mechanism has been prototyped by @jhorstmann in  
https://github.com/apache/arrow-rs/pull/813
   
   **Describe alternatives you've considered**
   Could wait for `arrow2` convergence, if that happens, but since the timeline 
on that ETA is still unknown, safety for the `arrow-rs` implementation seems to 
justify spending time here
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to