I agree that we need to audit use of unsafe and think carefully about how
we use it.

This blog post is somewhat tangential since it is primarily about the
actix-web situation with the author quitting open source, but "unsafe"
played a large role in it, and I think this is worth reading.

 https://words.steveklabnik.com/a-sad-day-for-rust

Andy.

On Fri, Jan 17, 2020 at 11:15 AM Neville Dipale <nevilled...@gmail.com>
wrote:

> Hi Paddy, Arrow Developers,
>
> I've given this some thought, and I preliminarily think that perhaps we can
> audit our use of unsafe and evaluate where we can remove it, propagate it
> upwards (and provide safe alternatives) or provide some safety to callers.
>
> Looking at the 3 options that Paul Kernfeld raised in the linked JIRA:
>
>
>    1. Add in bounds checking so that we don't need to deal with unsafe at
>    all.
>    2. Propagate the unsafes up through the code.
>    3. Maintain a safe and unsafe version of each function that is currently
>    unsafe.
>
>
> I think bounds checking would hurt performance, an example being the
> changes introduced in https://issues.apache.org/jira/browse/ARROW-4670. In
> ARROW-4670, I believe we were able to get the compiler to auto-vectorise
> due to Array::value() avoiding bounds checks. In the case of compute, we
> are in control of the array length, and so we know that it's safe to skip
> bounds checking. I presume this would largely be the case in tabular-data
> use-cases (because we assert that arrows in a record batch meet certain
> criteria).
>
> From a cursory glance, if we do find that we don't need explicit SIMD
> (still immature in Rust, I've found it difficult to implement in some
> cases), we could potentially reduce our unsafe count by around 20%. The
> flatbuffers generated files also introduce a lot of unsafe (~26%), so we'd
> need to maybe adopt option 2 from Paul on IPC once we're done with the
> basics.
>
> We'd then mainly be left with bit manipulation and `Buffer` (which as as
> much unsafe as the fbs generated files). I think the API around buffer
> would depend on whether we're expecting (based on what can be done with
> buffers) this to be exposed to users beyond those using Arrow as a
> development platform.
>
> The above are some of my thoughts, but important's that I don't have a lot
> of experience with Rust, especially `unsafe` and the other dark corners of
> the language.
>
> Regards
> Neville
>
> On Fri, 10 Jan 2020 at 04:13, paddy horan <paddyho...@hotmail.com> wrote:
>
> > Hi All,
> >
> > This time last year there was a brief discussion on the usage of unsafe
> in
> > Rust (a user on github raised the issue and I created the JIRA). [1]
> >
> > So far we mostly avoid unsafe in the public API's.  The thinking here is
> > that Arrow is a "development platform", i.e. lower level that most
> > libraries, and library builders will want to avoid any performance hit of
> > bounds checking, etc.
> >
> > This is not typical in the Rust community where unsafe is a clear signal
> > that care is needed.  Although it might clutter the API a little more I
> > would be in favor of having safe and unsafe variants of methods as
> needed.
> > For instance, "value" for array access would be changed to "value" and
> > "value_unchecked" where the latter is unsafe and does not perform bounds
> > checks.
> >
> > We don't have a huge number of libraries building on top of Arrow in Rust
> > at the moment so it seems like a good time, before 1.0, to decide on this
> > to avoid breaking changes to the public API in post 1.0.
> >
> > Thoughts?
> >
> > Paddy
> >
> > [1] https://issues.apache.org/jira/browse/ARROW-3776?filter=12343557
> >
> >
>

Reply via email to