Thanks for all the feedback so far.

It does seem that the least contentious way to do this would be to follow
Andrew's suggestion of having a separate
apache/[arrow-]datafusion-sqlparser repository as this will ensure that we
do not end up adding any DataFusion dependencies to the sqlparser project,
and that it continues to have its own release process.

The main benefit here is that it would bring it under ASF governance and
allow those who have permission from their employers to contribute to
Apache Arrow/DataFusion to be able to help with the maintenance burden.

Andy.



On Wed, Feb 28, 2024 at 4:28 AM Andrew Lamb <al...@influxdata.com> wrote:

> One potential way "moving sqlparser-rs into DataFusion" could look is that
> code/repo is moved from the sqlparser-rs [1] organization to the apache
> organization. For example
>
> https://github.com/sqlparser-rs/sqlparser-rs
> to
> https://github.com/apache/datafusion-sqlparser
>
> We could continue development separately from any other code, release it as
> a separate artifact, but use the same overarching governance structure
> (voting on releases, committer access, etc)
>
> To follow this model, I think the largest work item would be to run the IP
> clearance process, and since sqlparser-rs has many distinct contributors
> that may take a while
>
> Andrew
>
>
>
> On Wed, Feb 28, 2024 at 1:45 AM Aldrin <octalene....@pm.me.invalid> wrote:
>
> > Maybe it would be valuable to more explicitly define "moving back into
> > DataFusion project".
> >
> > I assumed it meant absorbing into the datafusion repo, but it occurs to
> me
> > that may not be the case. Then, how would sqlparser-rs be "moved"?
> >
> >
> >
> > # ------------------------------
> > # Aldrin
> >
> >
> > https://github.com/drin/
> > https://gitlab.com/octalene
> > https://keybase.io/octalene
> >
> >
> > On Tuesday, February 27th, 2024 at 16:20, Chak-Pong Chung <
> > chakpongch...@gmail.com> wrote:
> >
> > > There are cases where people need datafusion but not a SQL parser. For
> > > example, people building a composable query engine for graph or other
> > data
> > > modality may not choose SQL as the DSL. Decoupling them seems to be a
> > good
> > > idea.
> > >
> >
> > > On Tue, Feb 27, 2024, 6:20 AM Mehmet Ozan Kabak o...@synnada.ai wrote:
> > >
> >
> > > > In this case, maybe we can bring sqlparser-rs into the ASF umbrella
> > > > following the arrow-datafusion model?
> > > >
> >
> > > > Once DataFusion becomes a top-level project, we could move it to
> > > > datafusion-sqlparser-rs — it would be a quasi-independent project
> just
> > like
> > > > how DataFusion is today w.r.t. Arrow. But it would get most benefits
> of
> > > > having a community behind it.
> > > >
> >
> > > > > On Feb 27, 2024, at 2:11 AM, Andrew Lamb al...@influxdata.com
> wrote:
> > > > >
> >
> > > > > Julian, thank you for your insight. I very much agree with it.
> > > > >
> >
> > > > > > I think the ASF is wrong on this. I think it needs to provide a
> > home
> > > > > > for medium-sized projects such as sqlparser-rs in an existing
> > > > > > top-level project;
> > > > >
> >
> > > > > It could be said that DataFusion fits this model -- it isn't really
> > an
> > > > > "Arrow" project but needed a place to live and grow, and the Arrow
> > ASF
> > > > > community provided that.
> > > > >
> >
> > > > > Andrew
> > > > >
> >
> > > > > On Mon, Feb 26, 2024 at 1:09 PM Julian Hyde jh...@apache.org
> wrote:
> > > > >
> >
> > > > > > I am torn on this.
> > > > > >
> >
> > > > > > One one hand, I am a big fan of components that are standalone -
> > have
> > > > > > no more dependencies than necessary, and are self-evidently
> > > > > > standalone. So, I think that re-absorbing sqlparser-rs back into
> > > > > > DataFusion would not be a good step. It would reduce the
> perception
> > > > > > that it is standalone.
> > > > > >
> >
> > > > > > On the other hand, it sounds as if sqlparser-rs would benefit by
> > > > > > having an Apache-like community around it. DataFusion isn't a
> > perfect
> > > > > > fit - there is not much overlap between DataFusion and
> sqlparser-rs
> > > > > > users - but it takes a lot of effort to create and run a
> top-level
> > > > > > project, and DataFusion is already up and running.
> > > > > >
> >
> > > > > > The tension is that people want to consume components that they
> > > > > > perceive to be standalone, and yet the ASF wants to create
> > communities
> > > > > > that produce either a single large component or sets of
> > highly-coupled
> > > > > > components. The ASF used to do 'umbrella projects' whose
> > sub-projects
> > > > > > were in the same subject area but had little or no dependencies.
> > For
> > > > > > example, Apache DB [ https://db.apache.org/ ] has JDO, Derby and
> > > > > > Torque. And commons included many useful Java libraries. Umbrella
> > > > > > projects caused problems during the Jakarta and Hadoop eras, and
> > now
> > > > > > are strongly discouraged at the ASF.
> > > > > >
> >
> > > > > > I think the ASF is wrong on this. I think it needs to provide a
> > home
> > > > > > for medium-sized projects such as sqlparser-rs in an existing
> > > > > > top-level project; maybe those projects grow into top-level
> > projects,
> > > > > > or maybe they remain medium-sized projects. This is especially
> > > > > > necessary in the Rust community, where there are many exciting
> > > > > > projects, but they are almost all happening outside ASF. (This is
> > > > > > exactly where Java was in ~2005. Maybe we need a rust-commons or
> > > > > > rust-db?)
> > > > > >
> >
> > > > > > My conclusion is to leave sqlparser-rs where it is for now, but
> to
> > > > > > continue talking about what might be an attractive home for it in
> > ASF.
> > > > > >
> >
> > > > > > Julian
> > > > > >
> >
> > > > > > On Mon, Feb 26, 2024 at 8:12 AM Andrew Lamb al...@influxdata.com
> > > > > > wrote:
> > > > > >
> >
> > > > > > > Sorry for the late reply,
> > > > > > >
> >
> > > > > > > I think sqlparser-rs users are quite a bit more varied than
> > DataFusion
> > > > > > > and
> > > > > > > there is not a large overlap between the contributors of the
> two
> > > > > > > projects.
> > > > > > > I currently seem to be the one reviewing / merging most
> > sqlparser-rs
> > > > > > > reviews, and I would definitely love some more help.
> > > > > > >
> >
> > > > > > > However, given that the project is not an Apache project, I did
> > not
> > > > > > > have
> > > > > > > good luck attracting help. A related discussion is here 1.
> > > > > > >
> >
> > > > > > > If the DataFusion community would like to accelerate releases,
> > we can
> > > > > > > also
> > > > > > > try to do that without bringing it into Apache governance.
> > > > > > > Specifically,
> > > > > > > it
> > > > > > > would be great to have help reviewing the PRs -- the actual
> > release
> > > > > > > process
> > > > > > > is pretty low overhead. The reviews are what take the vast
> > majority of
> > > > > > > the
> > > > > > > maintenance time.
> > > > > > >
> >
> > > > > > > Andrew
> > > > > > >
> >
> > > > > > > On Sat, Feb 17, 2024 at 4:44 PM Aldrin
> octalene....@pm.me.invalid
> > > > > > > wrote:
> > > > > > >
> >
> > > > > > > > do users of sqlparser-rs mostly use datafusion? I don't know
> > the
> > > > > > > > community, but it seems like it would be an annoying change
> > for users
> > > > > > > > who
> > > > > > > > use it with a different query engine. Just a thought
> > > > > > > >
> >
> > > > > > > > Sent from Proton Mail https://proton.me/mail/home for iOS
> > > > > > > >
> >
> > > > > > > > On Sat, Feb 17, 2024 at 10:26, Andy Grove <
> > andygrov...@gmail.com
> > > > > > > > <On+Sat,+Feb+17,+2024+at+10:26,+Andy+Grove+%3C%3Ca+href=>>
> > wrote:
> > > > > > > >
> >
> > > > > > > > I agree that it simplifies shipping new SQL features in
> > DataFusion
> > > > > > > > since we
> > > > > > > > can develop the changes in the parser concurrently with the
> > changes in
> > > > > > > > other DataFusion crates and then release them all together.
> > > > > > > >
> >
> > > > > > > > The name of the crate would not need to change, so downstream
> > users
> > > > > > > > should
> > > > > > > > see no impact.
> > > > > > > >
> >
> > > > > > > > We would need to decide if we want to keep a separate version
> > number
> > > > > > > > or
> > > > > > > > bring it in line with DataFusion version numbers (I have no
> > preference
> > > > > > > > either way).
> > > > > > > >
> >
> > > > > > > > On Sat, Feb 17, 2024 at 11:09 AM Mehmet Ozan Kabak
> > o...@synnada.ai
> > > > > > > > wrote:
> > > > > > > >
> >
> > > > > > > > > Doing this will probably reduce the time-to-ship for
> > DataFusion
> > > > > > > > > features
> > > > > > > > > that need parsing support due to increased convenience, so
> > I’m
> > > > > > > > > inclined
> > > > > > > > > to
> > > > > > > > > see it in a positive light.
> > > > > > > > >
> >
> > > > > > > > > What would be the impact of doing this on people who use
> only
> > > > > > > > > sqlparser-rs, if any?
> > > > > > > > >
> >
> > > > > > > > > > On Feb 17, 2024, at 7:16 PM, Andy Grove
> > andygrov...@gmail.com
> > > > > > > > > > wrote:
> > > > > > > > > >
> >
> > > > > > > > > > The sqlparser-rs project 1 seems to have become the
> > de-facto SQL
> > > > > > > > > > parser
> > > > > > > > > > for Rust, with almost 4 million downloads so far. This
> was
> > > > > > > > > > originally
> > > > > > > > > > part
> > > > > > > > > > of DataFusion very early on, and I moved it into a
> > separate project
> > > > > > > > > > because
> > > > > > > > > > it seemed useful for other projects. This was before
> > DataFusion was
> > > > > > > > > > known
> > > > > > > > > > as a composable query engine, and with hindsight, I
> > probably should
> > > > > > > > > > have
> > > > > > > > > > left it as part of the DataFusion project.
> > > > > > > > > >
> >
> > > > > > > > > > Now that DataFusion has a reputation as a composable
> query
> > engine,
> > > > > > > > > > I
> > > > > > > > > > think
> > > > > > > > > > it would make sense to move this code back into
> > DataFusion, where
> > > > > > > > > > it
> > > > > > > > > > would
> > > > > > > > > > benefit from a larger community of maintainers.
> > > > > > > > > >
> >
> > > > > > > > > > I would like to hear thoughts from the Apache Arrow /
> > DataFusion
> > > > > > > > > > community.
> > > > > > > > > > Does this seem like a good idea?
> > > > > > > > > >
> >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> >
> > > > > > > > > > Andy.
> > > > > > > > > >
> >
> > > > > > > > > > 1 https://github.com/sqlparser-rs/sqlparser-rs
>

Reply via email to