I also like the idea of moving arrow2/parquet2 into the official repos.
This is effectively what we did with Ballista, which is still experimental.
Ballista was simpler because it depends on DataFusion rather than the other
way around, but I like the idea of using feature flags to enable DataFusion
on arrow2/parquet2.

I don't see any reason why we wouldn't be able to also release
arrow2/parquet2 with suitable 0.x.x versioning as well (as we plan on doing
with Ballista) and releasing would be much easier if they are in the
official repos.


On Tue, Aug 3, 2021 at 7:13 AM paddy horan <paddyho...@hotmail.com> wrote:

> Hi Jorge,
>
> What do you think about moving Arrow2 into the main Arrow repo where it is
> only enabled via an "experimental" feature flag?  This would allow
> development of Arrow2 to proceed in the main repo but also this would be a
> clear signal that Arrow2 is <1.0.  When we feel ready (i.e. Arrow2 is 1.0)
> we can release it in the next main release with Arrow2 being the default
> and move the existing implementation behind a "legacy" feature flag.
>
> Here is why I think this might work well:
>  - People contributing to the Arrow project will naturally contribute to
> Arrow2.  At the moment, some people will still contribute to Arrow instead
> of Arrow2 just by virtue of it being the "official" implementation.
> However, if both are in one repo people will want to contribute to the
> "future", i.e. Arrow2.
>  - the experimental flag will be a clear signal to the existing Arrow
> community that Arrow2 is the future but that it is <1.0
>  - existing users will be well supported in this transition
>  - In general, I think the longer that development proceeds in separate
> repos the harder it will be to eventually merge the two in a way that
> supports existing users.
>
> Do you think would work?
>
> Paddy
>
> -----Original Message-----
> From: Jorge Cardoso Leitão <jorgecarlei...@gmail.com>
> Sent: Monday, August 2, 2021 1:59 PM
> To: dev@arrow.apache.org
> Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
>
> Hi,
>
> Sorry for the delay.
>
> If there is a path towards an official release under a <1.0.0 versioning
> schema aligned with the rest of the Rust ecosystem and in line with the
> stability of the API, then IMO we should move all development to within
> Apache experimental asap (I can handle this and the likely IP clearance
> round). If we require a release >=1.X.Y to it and/or a schedule, then I
> prefer to keep expectations aligned and postpone any movement.
>
> Under the move situation, I was thinking in something as follows:
>
> * gradually stop maintaining "arrow" in crates, offering a maintenance
> window over which we release patches (*)
> * work towards achieving feature parity on arrow2/parquet2 on the
> experimental repos.
> * keep releasing arrow2/parquet2 under a 0.X model during the step above
> (**)
> * migrate to arrow-rs and archive experimentals (***)
> * break arrow2 in smaller crates so that we can version the APIs at a
> different cadence
> * once a crate reaches some stability (this is always opinionated, but it
> is fine), we bump it to 1.0 and announce a maintenance plan ala tokio <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio.rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7C1b3176da8b6b45407c4208d955df3394%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637635239391364824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=lpj8KTpf3c3t0zxo28dSqtuJ82xfMtPssmxzNkrj%2BBQ%3D&amp;reserved=0
> >.
>
> (*) e.g. "we will continue to patch the arrow crate up to at least 6
> months starting after the first release of arrow2 that supports
> a) nested parquet read and write
> b) union array (including IPC integration tests)
> c) map array (including IPC integration tests)"
>
> (**) officially or un-officially (I would suggest officially so that we
> can acknowledge everyone's work on it, but no strong feelings)
>
> (***) something like:
> 1. place arrow2 on top of a clear arrow repo so that the full contribution
> history up to that point preserved 2. make arrow-rs the home of arrow2
> (i.e. we start releasing arrow2 from
> arrow-rs) and archive the experimental repos; create arrow-rs-parquet or
> something for parquet2.
>
> In summary, the core pain point for me is the current versioning of arrow,
> which I feel is incompatible with my goals for arrow2 and the ecosystem I
> envision it supporting :)
>
> Best,
> Jorge
>
> On Fri, Jul 30, 2021 at 8:44 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > I think it would also be fine to push "beta" arrow2 crates out of a
> > repo under apache/ so long as they are not marked on crates.io as
> > being Apache-official releases. There's a possible slippery slope
> > there, but as long as we are on a path to formalizing the releases I
> think it is okay.
> >
> > On Fri, Jul 30, 2021 at 1:07 PM Andrew Lamb <al...@influxdata.com>
> wrote:
> >
> > > Jorge -- do you feel like we have a resolution on what to do with
> > > arrow2
> > in
> > > the near term?
> > >
> > > The current state of affairs seems to me that arrow2 is released
> > > from
> > >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjorgecarleitao%2Farrow2&amp;data=04%7C01%7C%7C1b3176da8b6b45407c4208d955df3394%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637635239391364824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=W1TaT%2BFVGrGL1Oay9QclLozhkfNS78jPdrkZFIFRtjA%3D&amp;reserved=0
> to crates.io (which is fine).
> > > Are
> > > you happy with keeping development in the jorgecarleitao repo where
> > > you will retain maximal control and flexibility until it is ready to
> > > start integrating?
> > >
> > > Or would you prefer to put it into one of the apache repos and
> > > subject
> > its
> > > development and release to the normal Arrow governance model
> > > (tarball, vote, etc)?
> > >
> > > Since you are the primary author/architect I think you should have a
> > > substantial say at this stage.
> > >
> > > Andrew
> > >
> > >
> > > On Tue, Jul 27, 2021 at 7:16 PM Andrew Lamb <al...@influxdata.com>
> > wrote:
> > >
> > > > I would be happy with this approach. Thank you for the suggestion
> > > >
> > > > This hybrid approach of both arrow and arrow2 in the same repo
> > > > seems better to me than separate repos.
> > > >
> > > > What I really care about is ensuring we don't have two crates/APIs
> > > > indefinitely -- as long as we are continually making progress
> > > > towards unification that is what is important to me.
> > > >
> > > > Andrew
> > > >
> > > > On Tue, Jul 27, 2021 at 1:40 PM Andy Grove <andygrov...@gmail.com>
> > > wrote:
> > > >
> > > >> Apologies for being late to this discussion.
> > > >>
> > > >> There is a hybrid option to consider here where we add the arrow2
> > > >> code into the arrow crate as a separate module, so we release one
> > > >> crate
> > containing
> > > >> the "old" API (which we can mark as deprecated) as well as the
> > > >> new
> > API.
> > > >> Java did a similar thing a long time ago with "java.io" versus
> > > "java.nio"
> > > >> (new IO).
> > > >>
> > > >> I agree that the versioning wouldn't be ideal, but this seems
> > > >> like it might be a pragmatic compromise?
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Andy.
> > > >>
> > > >>
> > > >> On Tue, Jul 20, 2021 at 5:41 AM Andrew Lamb
> > > >> <al...@influxdata.com>
> > > wrote:
> > > >>
> > > >> > What I meant is that when you decide arrow2 is suitable for
> > > >> > release
> > to
> > > >> > existing arrow users, I stand ready to help you incorporate it
> > > >> > into
> > > >> arrow.
> > > >> >
> > > >> > All the feedback I have heard so far from the rest of the
> > > >> > community
> > is
> > > >> that
> > > >> > we are ready. One might even say we are anxious to do so :)
> > > >> >
> > > >> > Andrew
> > > >> >
> > > >>
> > > >
> > >
> >
>

Reply via email to