I have created a WIP PR for initial feedback on the approach of validating ArrayData upon creation[1]. If there are no objections to the approach I will complete the implementation over the next few days
The approach that Sergey describes of `get` and `unsafe get_unchecked` sounds like a good one to me if performance testing shows we need a bypass. Andrew [1] https://github.com/apache/arrow-rs/pull/810 On Thu, Sep 30, 2021 at 5:28 PM Sergey Davidoff <[email protected]> wrote: > I believe feature flags are not the right choice here. The problem is that > feature flags are toggled globally, so the behavior of your crate can be > affected by the behavior of some other crate that toggles the feature. > > A better approach is to provide two methods, one safe and the other > unsafe, see for example: > https://doc.rust-lang.org/stable/std/primitive.slice.html#method.get > > https://doc.rust-lang.org/stable/std/primitive.slice.html#method.get_unchecked > > Speaking of O(1) vs O(n), a correct program would perform these checks at > least once anyway, and so the complexity is unavoidable. However, you can > avoid re-checking bounds for the same buffer over and over if the state of > the check is encoded in the type system - for example, the `&str` standard > library type is just a slice that's already been checked and found to be > valid UTF-8. > > Although I struggle to imagine bounds checks being a performance > bottleneck when done once on object creation, given that removing them from > hot loops usually wins you only a couple % in performance. > > чт, 30 сент. 2021 г. в 22:28, Andrew Lamb <[email protected]>: > >> I understand the need to avoid sacrificing performance as much as >> possible. I have begun looking into adding validation into ArrayData::new >> as you suggest. >> >> I am making progress, but haven't fully figured out the nested types yet. >> Hope to have a PR up in the next day or two. >> >> Should we come up with something that adds unacceptable performance >> overhead, we can also feature flag it (so that the check is done by >> default, but can be disabled if needed) >> >> Andrew >> >> On Thu, Sep 30, 2021 at 5:33 AM Jörn Horstmann < >> [email protected]> wrote: >> >>> Most of these issues seem to originate from creating arrays from >>> ArrayData. >>> While we could validate buffers in the XArray::from implementations, that >>> would have some performance overhead and also some edge cases for nested >>> data. Thinking about List<List<Utf8>>>, we recursively would need to >>> validate and know which buffers are used for offsets or utf8 data. Since >>> the root cause is always ArrayData::new (and possibly >>> ArrayDataBuilder::finish) I would suggest marking those functions as >>> unsafe. >>> >>> Adding checks for the top-level buffer length would still be useful, I'm >>> just a bit opposed to having to iterate over whole buffers for >>> validation, >>> turning these operations from O(1) to O(n). >>> >>> Slightly related, GenericListArray::try_new_from_array_data currently >>> validates that the first offset is 0. I think this check could actually >>> be >>> relaxed and the offset only needs to be in bounds for the child array. >>> >>> Jörn >>> >>> >>> On Wed, Sep 29, 2021 at 9:12 PM Andrew Lamb <[email protected]> >>> wrote: >>> >>> > Dear Rust Developers, >>> > >>> > As a heads up, several pre-existing security tickets filed against >>> arrow-rs >>> > have been added[1][2] to the RUSTSEC database[1][2] a few hours ago. >>> The >>> > author says he plans to file additional RUSTSEC entries for the other >>> > security tickets [3]. >>> > >>> > The criteria used for adding the arrow issues to the RUSTSEC database >>> is >>> > not clear to me, but given widely used tools such as `cargo audit` >>> report >>> > such issues, it is likely that this will become an visible issue for >>> our >>> > users soon. >>> > >>> > Given this, I will likely start looking into existing security issues >>> [4] >>> > reported against arrow-rs and any help would be appreciated. >>> > >>> > Andrew >>> > >>> > [1] https://github.com/rustsec/advisory-db/pull/1057 >>> > [2] https://github.com/rustsec/advisory-db/pull/1059 >>> > [3] >>> > >>> https://github.com/rustsec/advisory-db/pull/1057#issuecomment-930455127 >>> > [4] https://github.com/apache/arrow-rs/labels/security >>> > >>> >>> >>> -- >>> *Jörn Horstmann* | Senior Backend Engineer >>> >>> www.signavio.com >>> Kurfürstenstraße 111, 10787 Berlin, Germany >>> >>> Work with us! <https://hubs.ly/H0wwzcr0> >>> >>> >>> >>> <https://www.linkedin.com/company/signavio/> >>> <https://www.twitter.com/signavio> <https://www.facebook.com/signavio> >>> <https://www.youtube.com/user/signavio> >>> <https://www.xing.com/companies/signaviogmbh> >>> >>> [image: BPI Forum] < >>> https://t-eu.xink.io/Tracking/Index/sSMAALBtAAAnu0MA0> >>> >>> HRB 121584 B Charlottenburg District Court, VAT ID: DE265675123 >>> Managing Directors: Dr. Gero Decker, Rouven Morato Adam >>> >>
