Hi Jorge,

I tried running the code you pasted but it didnt compile. I get the next
error:

the trait `AsRef<[u8]>` is not implemented for `[i32; 2i32]`


I had to change it to this to compile:

let buffer = Buffer::from(&[0u8, 2]);
> let data = ArrayData::new(DataType::Int64, 10, None, None, 0,
> vec![buffer], vec![]);
> let array = Float64Array::from(Arc::new(data));
>
println!("{:?}", array.value(1))


I didn't get an error out of it.

I do agree with you that there are several instances of unsafe in the code
that are not properly justified and that may lead to more problems in
the future.

Another thing that I have noticed is the pattern used in the api where a
struct has an implementation called new and another called
new_with_options. The second function happens to exist only if you want to
create it using an options object. This could be simplified by using an
enum with all the possible options and usen an Option<&[NEW ENUM]> for the
optional parameters.

I think it does make sense to think about the crate and plan it better to
avoid these issues and improve it further.

Regards,
Fernando

On Sun, Feb 7, 2021 at 1:42 PM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Hi,
>
> Over the past 4 months, I have been growing more and more frustrated by the
> amount of undefined behaviour that I am finding and fixing on the Rust
> implementation. I would like to open the discussion of a broader overview
> about the problem in light of our current knowledge and what Rust enables
> as well as offer a solution to the bigger problem.
>
> Just to give you a gist of the seriousness of the issue, the following
> currently compiles, runs, and is undefined behavior in Rust:
>
> let buffer = Buffer::from(&[0i32, 2i32]);let data =
> ArrayData::new(DataType::Int64, 10, 0, None, 0, vec![buffer],
> vec![]);let array = Float64Array::from(Arc::new(data));
> println!("{:?}", array.value(1));
>
> I would like to propose a major refactor of the crate around physical
> traits, Buffer, MutableBuffer and ArrayData to make our code type-safe at
> compile time, thereby avoiding things like the example above from happening
> again.
>
> So far, I was able to reproduce all core features of the arrow crate
> (nested types, dynamic typing, FFI, memory alignment, performance) by using
> `Buffer<T: NativeType>` instead of `Buffer` and removing `ArrayData` and
> RawPointer altogether.
>
> Safety-wise, it significantly limits the usage of `unsafe` on higher end
> APIs, it has a single transmute (the bit chunk iterator one), and a
> guaranteed safe public API (which is not the case in our master, as shown
> above).
>
> Performance wise, it yields a 1.3x improvement over the current master
> (after this fix <https://github.com/apache/arrow/pull/9301> of UB on the
> take kernel, 1.7x prior to it) for the `take` kernel for primitives. I
> should have other major performance improvements.
>
> API wise, it simplifies the traits that we have for memory layout as well
> as the handling of bitmaps, offsets, etc.
>
> The proposal is drafted as a README
> <https://github.com/jorgecarleitao/arrow2/blob/proposal/README.md> on a
> repo that I created specifically for this from the ground up, and the full
> set of changes are in a PR <
> https://github.com/jorgecarleitao/arrow2/pull/1>
> so that anyone can view and comment on it. I haven't made any PR to master
> because this is too large to track as a diff against master, and is beyond
> the point, anyways.
>
> I haven't ported most of the crate as I only tried the non-trivial features
> (memory layout, bitmaps, FFI, dynamic typing, nested types).
>
> I would highly appreciate your thoughts about it.
>
> Best,
> Jorge
>

Reply via email to