I agree that we need to audit use of unsafe and think carefully about how we use it.
This blog post is somewhat tangential since it is primarily about the actix-web situation with the author quitting open source, but "unsafe" played a large role in it, and I think this is worth reading. https://words.steveklabnik.com/a-sad-day-for-rust Andy. On Fri, Jan 17, 2020 at 11:15 AM Neville Dipale <nevilled...@gmail.com> wrote: > Hi Paddy, Arrow Developers, > > I've given this some thought, and I preliminarily think that perhaps we can > audit our use of unsafe and evaluate where we can remove it, propagate it > upwards (and provide safe alternatives) or provide some safety to callers. > > Looking at the 3 options that Paul Kernfeld raised in the linked JIRA: > > > 1. Add in bounds checking so that we don't need to deal with unsafe at > all. > 2. Propagate the unsafes up through the code. > 3. Maintain a safe and unsafe version of each function that is currently > unsafe. > > > I think bounds checking would hurt performance, an example being the > changes introduced in https://issues.apache.org/jira/browse/ARROW-4670. In > ARROW-4670, I believe we were able to get the compiler to auto-vectorise > due to Array::value() avoiding bounds checks. In the case of compute, we > are in control of the array length, and so we know that it's safe to skip > bounds checking. I presume this would largely be the case in tabular-data > use-cases (because we assert that arrows in a record batch meet certain > criteria). > > From a cursory glance, if we do find that we don't need explicit SIMD > (still immature in Rust, I've found it difficult to implement in some > cases), we could potentially reduce our unsafe count by around 20%. The > flatbuffers generated files also introduce a lot of unsafe (~26%), so we'd > need to maybe adopt option 2 from Paul on IPC once we're done with the > basics. > > We'd then mainly be left with bit manipulation and `Buffer` (which as as > much unsafe as the fbs generated files). I think the API around buffer > would depend on whether we're expecting (based on what can be done with > buffers) this to be exposed to users beyond those using Arrow as a > development platform. > > The above are some of my thoughts, but important's that I don't have a lot > of experience with Rust, especially `unsafe` and the other dark corners of > the language. > > Regards > Neville > > On Fri, 10 Jan 2020 at 04:13, paddy horan <paddyho...@hotmail.com> wrote: > > > Hi All, > > > > This time last year there was a brief discussion on the usage of unsafe > in > > Rust (a user on github raised the issue and I created the JIRA). [1] > > > > So far we mostly avoid unsafe in the public API's. The thinking here is > > that Arrow is a "development platform", i.e. lower level that most > > libraries, and library builders will want to avoid any performance hit of > > bounds checking, etc. > > > > This is not typical in the Rust community where unsafe is a clear signal > > that care is needed. Although it might clutter the API a little more I > > would be in favor of having safe and unsafe variants of methods as > needed. > > For instance, "value" for array access would be changed to "value" and > > "value_unchecked" where the latter is unsafe and does not perform bounds > > checks. > > > > We don't have a huge number of libraries building on top of Arrow in Rust > > at the moment so it seems like a good time, before 1.0, to decide on this > > to avoid breaking changes to the public API in post 1.0. > > > > Thoughts? > > > > Paddy > > > > [1] https://issues.apache.org/jira/browse/ARROW-3776?filter=12343557 > > > > >