alamb opened a new issue #817: URL: https://github.com/apache/arrow-rs/issues/817
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** This ticket lists a high level plan to address one of the main sources of security issues in arrow-rs such as https://github.com/apache/arrow-rs/issues/772 and likely several others on https://github.com/apache/arrow-rs/issues?q=is%3Aissue+is%3Aopen+label%3Asecurity As demonstrated in https://github.com/jorgecarleitao/arrow2#why, and almost all of the examples in https://github.com/apache/arrow-rs/issues?q=is%3Aissue+is%3Aopen+label%3Asecurity, creating `ArrayData::new` with invalid arguments can lead to undefined behavior. See also the discussion with @jhorstmann and others on https://lists.apache.org/thread.html/r3f12f3352ca36264622d4103fcb6c7c71544dcaf0f0a7e842f00c3a0%40%3Cdev.arrow.apache.org%3E **Describe the solution you'd like** I propose to follow the C++ implementation (kudos to @pitrou) in https://github.com/apache/arrow/blob/b73af9a1607caa4a04e1a11896aed6669847a4d4/cpp/src/arrow/array/validate.cc#L388-L392 Add two new functions: 1. `ArrayData::validate()` -- checks offsets / buffer sizes, relatively inexpensive 2. `ArrayData::validate_full` -- which calls`validate()` **AND** checks all variable length data structures for consistency (e.g. ensures the offsets of a StringArray are within the size of the base array Then, change `ArrayData` to have two constructors: 1. `unsafe ArrayData::new_unchecked()` - Behaves like `ArrayData::new()` does today -- namely has no validation 2. `ArrayData::new()` will be safe in the Rust sense -- can not cause undefined behavior and thus will call `ArrayData::validate_full` This design will follow the Rust philosophy of "safe by default" but offer an alternative (`unsafe`) mechanism to bypass checking for known good inputs. This `unsafe` mechanism has been prototyped by @jhorstmann in https://github.com/apache/arrow-rs/pull/813 **Describe alternatives you've considered** Could wait for `arrow2` convergence, if that happens, but since the timeline on that ETA is still unknown, safety for the `arrow-rs` implementation seems to justify spending time here **Additional context** Add any other context or screenshots about the feature request here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
