Krisztian and I have been working together on a native Rust implementation
of Arrow and have been exploring some different approaches.

I thought it was probably time to update everyone on this mailing list and
open this up to some more opinions.

I have filed a JIRA (https://issues.apache.org/jira/browse/ARROW-2361) for
donating a Rust implementation of Arrow and I also have a PR (
https://github.com/apache/arrow/pull/1804).

This initial code implements arrays supporting a subset of types
(primitives, strings, and structs). It uses contiguous regions of memory
but memory is not byte aligned yet. That isn't hard but I am more
interested in making sure the API is good first.

This PR is based on using Rust enums which provide type-safety and pattern
matching at runtime. The other approach we have been exploring is a
trait-based approach but we are having trouble making that work for structs
where we need mixed types and also hitting problems when we need to know
types at runtime because of Rust's type erasure, so currently I think that
enum is the way to go.

I am actively developing this code and using it as the foundation of
DataFusion and I am therefore confident it can do everything we need (with
some more work). The only thing I am unhappy about is the verbosity of some
of the code, but that be fixed with macros.

I would love to get some more opinions on this.

Thanks,

Andy.







On Sat, Mar 24, 2018 at 8:03 AM, Andy Grove <andygrov...@gmail.com> wrote:

> Krisztián,
>
> This is great research. I totally agree on using a Vec abstraction and
> using traits over enums.
>
> I know you have some working code already (albeit mostly just API) and I
> would suggest you create a PR to get that submitted as a starting point for
> us all to start contributing.
>
> I'm excited to start contributing to this and using DataFusion as a use
> case to drive requirements.
>
> Thanks,
>
> Andy.
>
>
>
>
>
> On Fri, Mar 23, 2018 at 1:09 PM, Krisztián Szűcs <
> szucs.kriszt...@gmail.com> wrote:
>
>>
>> Hey!
>> I've done a little research about implementing arrow in rust and I'd like
>> to share
>> my thoughts. Please Andy correct me if I'm wrong, still hiking rust's
>> learning curve.
>>
>> My first plan was to re-implement iron-arrow and mirror the cpp api as
>> close as
>> possible, but realized that rust can provide better ergonomics, somewhere
>> between
>>
>> cpp and python. Also cargo makes it possible to reuse other libraries
>> more easily.
>>
>> A couple of my findings:
>> We should provide a Vec like API for arrow::Array, a high quality example
>> is
>> servo/smallvec (https://link.getmailspring.co
>> m/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@
>> mail.gmail.com/0?redirect=https%3A%2F%2Fgithub.com%
>> 2Fservo%2Frust-smallvec%2Fblob%2Fmaster%2Flib.rs%
>> 23L80&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
>>
>> freeze method which turns a mutable array (ArrayBuilder in cpp's notation)
>> into an immutable one: ArrayMut.freeze() -> Array. Idea taken from
>> bytes (https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2
>> Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/1?redirect=
>> https%3A%2F%2Fcarllerche.github.io%2Fbytes%2Fbytes%
>> 2Fstruct.BytesMut.html%23examples&recipient=ZGV2QGFyc
>> m93LmFwYWNoZS5vcmc%3D) crate. Bytes crate would be great for using as a
>> buffer, but sadly doesn't
>> support custom memory layouts.
>>
>> Use the nightly allocator_api and raw_vec instead of a handcrafted one.
>> The only disadvantage is it's not stabilized yet however it's on the
>> roadmap,
>> see language (https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2
>> Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/2?redirect=
>> https%3A%2F%2Fblog.rust-lang.org%2F2018%2F03%2F12%
>> 2Froadmap.html&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) improvements. (
>> https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2
>> Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/3?redirect=
>> https%3A%2F%2Fblog.rust-lang.org%2F2018%2F03%2F12%
>> 2Froadmap.html&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) See the RFC (
>> https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2
>> Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/4?redirect=
>> https%3A%2F%2Fgithub.com%2Frust-lang%2Frfcs%2Fblob%
>> 2Fmaster%2Ftext%2F1398-kinds-of-allocators.md&recipient=ZGV
>> 2QGFycm93LmFwYWNoZS5vcmc%3D). Pros briefly:
>> Pluggable allocators, like https://github.com/alexcrichton/jemallocator (
>> https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2
>> Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/5?redirect=
>> https%3A%2F%2Fgithub.com%2Falexcrichton%2Fjemallocator&
>> recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
>>
>> Layout (https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2
>> Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/6?redirect=
>> https%3A%2F%2Fdoc.rust-lang.org%2Falloc%2Fallocator%2Fstruct.Layout.html&
>> recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) abstraction
>>
>> Easy to start with Heap (https://link.getmailspring.co
>> m/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@
>> mail.gmail.com/7?redirect=%20https%3A%2F%2Fdoc.rust-lang.
>> org%2Fstd%2Fheap%2Fstruct.Heap.html%20&recipient=ZGV2QGF
>> ycm93LmFwYWNoZS5vcmc%3D) implementation
>>
>> A low level RawVec (https://link.getmailspring.co
>> m/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@
>> mail.gmail.com/8?redirect=https%3A%2F%2Fdoc.rust-lang.
>> org%2Fnightly%2Falloc%2Fraw_vec%2Fstruct.RawVec.html&
>> recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) which is pretty close to
>> Arrow's buffer
>>
>> IMHO we should prefer trait based abstractions instead of enums, because
>> that would provide more flexibility and extensibility (with associated
>> types).
>>
>> If possible reuse bitvec implementations: bit-vec (
>> https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2
>> Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/9?redirect=
>> https%3A%2F%2Fgithub.com%2Fcontain-rs%2Fbit-vec&
>> recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) , bitvec (
>> https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2
>> Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/10?redirect=
>> https%3A%2F%2Fgithub.com%2Fmarcianx%2Fbitvec-rs&
>> recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
>>
>> I think we should specify the desired user facing API, then we might been
>> able to plan
>> the development. We can also have some help from Alex Crichton (
>> https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2
>> Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/11?redirect=
>> https%3A%2F%2Fgithub.com%2Falexcrichton&recipient=ZGV2Q
>> GFycm93LmFwYWNoZS5vcmc%3D), core rust-lang and
>> ecosystem developer. He was really helpful, gave me a couple of hints
>> already.
>>
>> What do You think?
>> Krisztian
>> On Fri, Mar 23, 2018 at 5:20 PM, Wes McKinney <wesmck...@gmail.com
>> (mailto:wesmck...@gmail.com)> wrote:
>> > Just "rust" would be fine for the top-level directory, I think.
>> >
>> > On Fri, Mar 23, 2018 at 12:09 PM, Andy Grove <andygrov...@gmail.com
>> (mailto:andygrov...@gmail.com)> wrote:
>> > > OK I would be happy with that. How should I get started? Should I just
>> > > create a PR to add a `rust` or `rust-native` root level directory
>> with some
>> > > starting code? I could do that this weekend.
>> > >
>> > > Thanks,
>> > >
>> > > Andy.
>> > >
>> > > On Fri, Mar 23, 2018 at 10:04 AM, Wes McKinney <wesmck...@gmail.com
>> (mailto:wesmck...@gmail.com)> wrote:
>> > >
>> > >> > Wes - if we continue developing an a separate repo for now to prove
>> > >> commitment levels and get this further along does that actually make
>> the IP
>> > >> clearance procedure harder with more individual contributors
>> involved?
>> > >>
>> > >> Yes, this will make things harder (since we will have to chase down
>> > >> ICLA's from each contributor). If you are going to work on a native
>> > >> implementation, I strongly recommend doing the work in the Apache
>> > >> community. The code does not need to be API-stable nor
>> > >> production-ready to go into the master branch.
>> > >>
>> > >> Thanks
>> > >>
>> > >> On Fri, Mar 23, 2018 at 11:51 AM, Andy Grove <andygrov...@gmail.com
>> (mailto:andygrov...@gmail.com)>
>> > >> wrote:
>> > >> > I probably shouldn't have used the term binding. I am primarily
>> > >> interested
>> > >> > in a native Rust implementation but it should be possible to have
>> traits
>> > >> > defining the interface and two implementations - one native and
>> one using
>> > >> > FFI to call C. Rust has zero overhead when calling C code
>> typically. I
>> > >> need
>> > >> > to know more about Arrow before I can say for sure.
>> > >> >
>> > >> > Wes - if we continue developing an a separate repo for now to prove
>> > >> > commitment levels and get this further along does that actually
>> make the
>> > >> IP
>> > >> > clearance procedure harder with more individual contributors
>> involved?
>> > >> >
>> > >> > Thanks,
>> > >> >
>> > >> > Andy.
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Fri, Mar 23, 2018 at 9:11 AM, Wes McKinney <wesmck...@gmail.com
>> (mailto:wesmck...@gmail.com)>
>> > >> wrote:
>> > >> >
>> > >> >> Not knowing the Rust ecosystem very well, I'm interested in the
>> > >> >> pros/cons of building and maintaining Rust bindings vs. a native
>> Rust
>> > >> >> implementation, or some hybrid of the two. Seems like both
>> bindings
>> > >> >> and native implementation could be part of the same codebase
>> > >> >> potentially.
>> > >> >>
>> > >> >> If we decide to import https://github.com/jihoonson/iron-arrow
>> into
>> > >> >> the Apache Arrow project, it will take 1-2 weeks to conduct the IP
>> > >> >> clearance procedure as we recently did for the Go implementation.
>> This
>> > >> >> is a lot of legwork for the PMC, so I want to make sure before we
>> do
>> > >> >> this that it is worth it, and that there's a plan to continue
>> actively
>> > >> >> developing this code.
>> > >> >>
>> > >> >> Thanks
>> > >> >> Wes
>> > >> >>
>> > >> >> On Fri, Mar 23, 2018 at 11:02 AM, Andy Grove <
>> andygrov...@gmail.com (mailto:andygrov...@gmail.com)>
>> > >> >> wrote:
>> > >> >> > My personal view (and I think I've seen others state this
>> already
>> > >> here)
>> > >> >> is
>> > >> >> > that we should bring it into the repo sooner rather than later
>> and
>> > >> work
>> > >> >> on
>> > >> >> > it there. The version is 0.1.0 so I think that sets peoples
>> > >> expectations
>> > >> >> > about how complete it is.
>> > >> >> >
>> > >> >> > I think it is better for people to see it in the arrow repo
>> being
>> > >> >> actively
>> > >> >> > developed. I'm very interested in getting compatibility unit
>> tests
>> > >> set up
>> > >> >> > soon too so we can be sure it really is compatible with the
>> other
>> > >> >> > implementations.
>> > >> >> >
>> > >> >> > Andy.
>> > >> >> >
>> > >> >> > On Fri, Mar 23, 2018 at 8:44 AM, paddy horan <
>> paddyho...@hotmail.com (mailto:paddyho...@hotmail.com)>
>> > >> >> wrote:
>> > >> >> >
>> > >> >> >> Hi Andy,
>> > >> >> >>
>> > >> >> >> I’m looking to get involved in contributing to the Rust
>> > >> implementation
>> > >> >> >> also, would love to see it in the arrow repo sooner rather than
>> > >> later.
>> > >> >> >>
>> > >> >> >> Should we identify what needs to be added to iron-Arrow before
>> it’s
>> > >> >> ready
>> > >> >> >> to be donated to the Apache repo?
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> Thanks,
>> > >> >> >> Paddy
>> > >> >> >>
>> > >> >> >> Get Outlook for iOS<https://aka.ms/o0ukef>
>> > >> >> >> _____________________________
>> > >> >> >> From: Andy Grove <andygrov...@gmail.com (mailto:
>> andygrov...@gmail.com)>
>> > >> >> >> Sent: Friday, March 23, 2018 9:08 AM
>> > >> >> >> Subject: Rust bindings
>> > >> >> >> To: <dev@arrow.apache.org (mailto:dev@arrow.apache.org)>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> Hi,
>> > >> >> >>
>> > >> >> >> Congratulations on the release of the Go bindings for Arrow. I
>> think
>> > >> >> Rust
>> > >> >> >> should be next ;-)
>> > >> >> >>
>> > >> >> >> I've been a bit distracted getting a release out in the day
>> job but
>> > >> am
>> > >> >> now
>> > >> >> >> working on iron-arrow and getting it ready to integrate with my
>> > >> >> project. I
>> > >> >> >> hope to be able to put some time in this weekend on this. I
>> don't
>> > >> think
>> > >> >> it
>> > >> >> >> will be very hard to get to a point where I am at least using
>> the
>> > >> Array
>> > >> >> >> type.
>> > >> >> >>
>> > >> >> >> I can commit to working on the Rust bindings moving forward
>> (weekends
>> > >> >> >> mostly) so I think we should go ahead and do this under the
>> arrow
>> > >> repo
>> > >> >> if
>> > >> >> >> everyone is in agreement.
>> > >> >> >>
>> > >> >> >> Thanks,
>> > >> >> >>
>> > >> >> >> Andy,
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> >
>>
>>
>

Reply via email to