In the past I was dealing with something similar. My experience was when data was accepted at the edge, the cost of validating that the first offset is zero, the last is within the data bounds and that all others are equal or increasing was a reasonable overhead associated with validating offsets from unsafe sources.
On Thu, Sep 30, 2021 at 1:29 PM Andrew Lamb <al...@influxdata.com> wrote: > I understand the need to avoid sacrificing performance as much as possible. > I have begun looking into adding validation into ArrayData::new as you > suggest. > > I am making progress, but haven't fully figured out the nested types yet. > Hope to have a PR up in the next day or two. > > Should we come up with something that adds unacceptable performance > overhead, we can also feature flag it (so that the check is done by > default, but can be disabled if needed) > > Andrew > > On Thu, Sep 30, 2021 at 5:33 AM Jörn Horstmann < > joern.horstm...@signavio.com> > wrote: > > > Most of these issues seem to originate from creating arrays from > ArrayData. > > While we could validate buffers in the XArray::from implementations, that > > would have some performance overhead and also some edge cases for nested > > data. Thinking about List<List<Utf8>>>, we recursively would need to > > validate and know which buffers are used for offsets or utf8 data. Since > > the root cause is always ArrayData::new (and possibly > > ArrayDataBuilder::finish) I would suggest marking those functions as > > unsafe. > > > > Adding checks for the top-level buffer length would still be useful, I'm > > just a bit opposed to having to iterate over whole buffers for > validation, > > turning these operations from O(1) to O(n). > > > > Slightly related, GenericListArray::try_new_from_array_data currently > > validates that the first offset is 0. I think this check could actually > be > > relaxed and the offset only needs to be in bounds for the child array. > > > > Jörn > > > > > > On Wed, Sep 29, 2021 at 9:12 PM Andrew Lamb <al...@influxdata.com> > wrote: > > > > > Dear Rust Developers, > > > > > > As a heads up, several pre-existing security tickets filed against > > arrow-rs > > > have been added[1][2] to the RUSTSEC database[1][2] a few hours ago. > The > > > author says he plans to file additional RUSTSEC entries for the other > > > security tickets [3]. > > > > > > The criteria used for adding the arrow issues to the RUSTSEC database > is > > > not clear to me, but given widely used tools such as `cargo audit` > > report > > > such issues, it is likely that this will become an visible issue for > our > > > users soon. > > > > > > Given this, I will likely start looking into existing security issues > [4] > > > reported against arrow-rs and any help would be appreciated. > > > > > > Andrew > > > > > > [1] https://github.com/rustsec/advisory-db/pull/1057 > > > [2] https://github.com/rustsec/advisory-db/pull/1059 > > > [3] > > > > https://github.com/rustsec/advisory-db/pull/1057#issuecomment-930455127 > > > [4] https://github.com/apache/arrow-rs/labels/security > > > > > > > > > -- > > *Jörn Horstmann* | Senior Backend Engineer > > > > www.signavio.com > > Kurfürstenstraße 111, 10787 Berlin, Germany > > > > Work with us! <https://hubs.ly/H0wwzcr0> > > > > > > > > <https://www.linkedin.com/company/signavio/> > > <https://www.twitter.com/signavio> <https://www.facebook.com/signavio> > > <https://www.youtube.com/user/signavio> > > <https://www.xing.com/companies/signaviogmbh> > > > > [image: BPI Forum] < > https://t-eu.xink.io/Tracking/Index/sSMAALBtAAAnu0MA0> > > > > HRB 121584 B Charlottenburg District Court, VAT ID: DE265675123 > > Managing Directors: Dr. Gero Decker, Rouven Morato Adam > > >