I will put this proposal on hold for now and restart the conversation later
this year once DataFusion is a top-level ASF project.

Thanks again for all the feedback.

Andy.

On Wed, Feb 28, 2024 at 9:58 AM Andy Grove <andygrov...@gmail.com> wrote:

> Thanks for all the feedback so far.
>
> It does seem that the least contentious way to do this would be to follow
> Andrew's suggestion of having a separate
> apache/[arrow-]datafusion-sqlparser repository as this will ensure that we
> do not end up adding any DataFusion dependencies to the sqlparser project,
> and that it continues to have its own release process.
>
> The main benefit here is that it would bring it under ASF governance and
> allow those who have permission from their employers to contribute to
> Apache Arrow/DataFusion to be able to help with the maintenance burden.
>
> Andy.
>
>
>
> On Wed, Feb 28, 2024 at 4:28 AM Andrew Lamb <al...@influxdata.com> wrote:
>
>> One potential way "moving sqlparser-rs into DataFusion" could look is that
>> code/repo is moved from the sqlparser-rs [1] organization to the apache
>> organization. For example
>>
>> https://github.com/sqlparser-rs/sqlparser-rs
>> to
>> https://github.com/apache/datafusion-sqlparser
>>
>> We could continue development separately from any other code, release it
>> as
>> a separate artifact, but use the same overarching governance structure
>> (voting on releases, committer access, etc)
>>
>> To follow this model, I think the largest work item would be to run the IP
>> clearance process, and since sqlparser-rs has many distinct contributors
>> that may take a while
>>
>> Andrew
>>
>>
>>
>> On Wed, Feb 28, 2024 at 1:45 AM Aldrin <octalene....@pm.me.invalid>
>> wrote:
>>
>> > Maybe it would be valuable to more explicitly define "moving back into
>> > DataFusion project".
>> >
>> > I assumed it meant absorbing into the datafusion repo, but it occurs to
>> me
>> > that may not be the case. Then, how would sqlparser-rs be "moved"?
>> >
>> >
>> >
>> > # ------------------------------
>> > # Aldrin
>> >
>> >
>> > https://github.com/drin/
>> > https://gitlab.com/octalene
>> > https://keybase.io/octalene
>> >
>> >
>> > On Tuesday, February 27th, 2024 at 16:20, Chak-Pong Chung <
>> > chakpongch...@gmail.com> wrote:
>> >
>> > > There are cases where people need datafusion but not a SQL parser. For
>> > > example, people building a composable query engine for graph or other
>> > data
>> > > modality may not choose SQL as the DSL. Decoupling them seems to be a
>> > good
>> > > idea.
>> > >
>> >
>> > > On Tue, Feb 27, 2024, 6:20 AM Mehmet Ozan Kabak o...@synnada.ai
>> wrote:
>> > >
>> >
>> > > > In this case, maybe we can bring sqlparser-rs into the ASF umbrella
>> > > > following the arrow-datafusion model?
>> > > >
>> >
>> > > > Once DataFusion becomes a top-level project, we could move it to
>> > > > datafusion-sqlparser-rs — it would be a quasi-independent project
>> just
>> > like
>> > > > how DataFusion is today w.r.t. Arrow. But it would get most
>> benefits of
>> > > > having a community behind it.
>> > > >
>> >
>> > > > > On Feb 27, 2024, at 2:11 AM, Andrew Lamb al...@influxdata.com
>> wrote:
>> > > > >
>> >
>> > > > > Julian, thank you for your insight. I very much agree with it.
>> > > > >
>> >
>> > > > > > I think the ASF is wrong on this. I think it needs to provide a
>> > home
>> > > > > > for medium-sized projects such as sqlparser-rs in an existing
>> > > > > > top-level project;
>> > > > >
>> >
>> > > > > It could be said that DataFusion fits this model -- it isn't
>> really
>> > an
>> > > > > "Arrow" project but needed a place to live and grow, and the Arrow
>> > ASF
>> > > > > community provided that.
>> > > > >
>> >
>> > > > > Andrew
>> > > > >
>> >
>> > > > > On Mon, Feb 26, 2024 at 1:09 PM Julian Hyde jh...@apache.org
>> wrote:
>> > > > >
>> >
>> > > > > > I am torn on this.
>> > > > > >
>> >
>> > > > > > One one hand, I am a big fan of components that are standalone -
>> > have
>> > > > > > no more dependencies than necessary, and are self-evidently
>> > > > > > standalone. So, I think that re-absorbing sqlparser-rs back into
>> > > > > > DataFusion would not be a good step. It would reduce the
>> perception
>> > > > > > that it is standalone.
>> > > > > >
>> >
>> > > > > > On the other hand, it sounds as if sqlparser-rs would benefit by
>> > > > > > having an Apache-like community around it. DataFusion isn't a
>> > perfect
>> > > > > > fit - there is not much overlap between DataFusion and
>> sqlparser-rs
>> > > > > > users - but it takes a lot of effort to create and run a
>> top-level
>> > > > > > project, and DataFusion is already up and running.
>> > > > > >
>> >
>> > > > > > The tension is that people want to consume components that they
>> > > > > > perceive to be standalone, and yet the ASF wants to create
>> > communities
>> > > > > > that produce either a single large component or sets of
>> > highly-coupled
>> > > > > > components. The ASF used to do 'umbrella projects' whose
>> > sub-projects
>> > > > > > were in the same subject area but had little or no dependencies.
>> > For
>> > > > > > example, Apache DB [ https://db.apache.org/ ] has JDO, Derby
>> and
>> > > > > > Torque. And commons included many useful Java libraries.
>> Umbrella
>> > > > > > projects caused problems during the Jakarta and Hadoop eras, and
>> > now
>> > > > > > are strongly discouraged at the ASF.
>> > > > > >
>> >
>> > > > > > I think the ASF is wrong on this. I think it needs to provide a
>> > home
>> > > > > > for medium-sized projects such as sqlparser-rs in an existing
>> > > > > > top-level project; maybe those projects grow into top-level
>> > projects,
>> > > > > > or maybe they remain medium-sized projects. This is especially
>> > > > > > necessary in the Rust community, where there are many exciting
>> > > > > > projects, but they are almost all happening outside ASF. (This
>> is
>> > > > > > exactly where Java was in ~2005. Maybe we need a rust-commons or
>> > > > > > rust-db?)
>> > > > > >
>> >
>> > > > > > My conclusion is to leave sqlparser-rs where it is for now, but
>> to
>> > > > > > continue talking about what might be an attractive home for it
>> in
>> > ASF.
>> > > > > >
>> >
>> > > > > > Julian
>> > > > > >
>> >
>> > > > > > On Mon, Feb 26, 2024 at 8:12 AM Andrew Lamb
>> al...@influxdata.com
>> > > > > > wrote:
>> > > > > >
>> >
>> > > > > > > Sorry for the late reply,
>> > > > > > >
>> >
>> > > > > > > I think sqlparser-rs users are quite a bit more varied than
>> > DataFusion
>> > > > > > > and
>> > > > > > > there is not a large overlap between the contributors of the
>> two
>> > > > > > > projects.
>> > > > > > > I currently seem to be the one reviewing / merging most
>> > sqlparser-rs
>> > > > > > > reviews, and I would definitely love some more help.
>> > > > > > >
>> >
>> > > > > > > However, given that the project is not an Apache project, I
>> did
>> > not
>> > > > > > > have
>> > > > > > > good luck attracting help. A related discussion is here 1.
>> > > > > > >
>> >
>> > > > > > > If the DataFusion community would like to accelerate releases,
>> > we can
>> > > > > > > also
>> > > > > > > try to do that without bringing it into Apache governance.
>> > > > > > > Specifically,
>> > > > > > > it
>> > > > > > > would be great to have help reviewing the PRs -- the actual
>> > release
>> > > > > > > process
>> > > > > > > is pretty low overhead. The reviews are what take the vast
>> > majority of
>> > > > > > > the
>> > > > > > > maintenance time.
>> > > > > > >
>> >
>> > > > > > > Andrew
>> > > > > > >
>> >
>> > > > > > > On Sat, Feb 17, 2024 at 4:44 PM Aldrin
>> octalene....@pm.me.invalid
>> > > > > > > wrote:
>> > > > > > >
>> >
>> > > > > > > > do users of sqlparser-rs mostly use datafusion? I don't know
>> > the
>> > > > > > > > community, but it seems like it would be an annoying change
>> > for users
>> > > > > > > > who
>> > > > > > > > use it with a different query engine. Just a thought
>> > > > > > > >
>> >
>> > > > > > > > Sent from Proton Mail https://proton.me/mail/home for iOS
>> > > > > > > >
>> >
>> > > > > > > > On Sat, Feb 17, 2024 at 10:26, Andy Grove <
>> > andygrov...@gmail.com
>> > > > > > > > <On+Sat,+Feb+17,+2024+at+10:26,+Andy+Grove+%3C%3Ca+href=>>
>> > wrote:
>> > > > > > > >
>> >
>> > > > > > > > I agree that it simplifies shipping new SQL features in
>> > DataFusion
>> > > > > > > > since we
>> > > > > > > > can develop the changes in the parser concurrently with the
>> > changes in
>> > > > > > > > other DataFusion crates and then release them all together.
>> > > > > > > >
>> >
>> > > > > > > > The name of the crate would not need to change, so
>> downstream
>> > users
>> > > > > > > > should
>> > > > > > > > see no impact.
>> > > > > > > >
>> >
>> > > > > > > > We would need to decide if we want to keep a separate
>> version
>> > number
>> > > > > > > > or
>> > > > > > > > bring it in line with DataFusion version numbers (I have no
>> > preference
>> > > > > > > > either way).
>> > > > > > > >
>> >
>> > > > > > > > On Sat, Feb 17, 2024 at 11:09 AM Mehmet Ozan Kabak
>> > o...@synnada.ai
>> > > > > > > > wrote:
>> > > > > > > >
>> >
>> > > > > > > > > Doing this will probably reduce the time-to-ship for
>> > DataFusion
>> > > > > > > > > features
>> > > > > > > > > that need parsing support due to increased convenience, so
>> > I’m
>> > > > > > > > > inclined
>> > > > > > > > > to
>> > > > > > > > > see it in a positive light.
>> > > > > > > > >
>> >
>> > > > > > > > > What would be the impact of doing this on people who use
>> only
>> > > > > > > > > sqlparser-rs, if any?
>> > > > > > > > >
>> >
>> > > > > > > > > > On Feb 17, 2024, at 7:16 PM, Andy Grove
>> > andygrov...@gmail.com
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> >
>> > > > > > > > > > The sqlparser-rs project 1 seems to have become the
>> > de-facto SQL
>> > > > > > > > > > parser
>> > > > > > > > > > for Rust, with almost 4 million downloads so far. This
>> was
>> > > > > > > > > > originally
>> > > > > > > > > > part
>> > > > > > > > > > of DataFusion very early on, and I moved it into a
>> > separate project
>> > > > > > > > > > because
>> > > > > > > > > > it seemed useful for other projects. This was before
>> > DataFusion was
>> > > > > > > > > > known
>> > > > > > > > > > as a composable query engine, and with hindsight, I
>> > probably should
>> > > > > > > > > > have
>> > > > > > > > > > left it as part of the DataFusion project.
>> > > > > > > > > >
>> >
>> > > > > > > > > > Now that DataFusion has a reputation as a composable
>> query
>> > engine,
>> > > > > > > > > > I
>> > > > > > > > > > think
>> > > > > > > > > > it would make sense to move this code back into
>> > DataFusion, where
>> > > > > > > > > > it
>> > > > > > > > > > would
>> > > > > > > > > > benefit from a larger community of maintainers.
>> > > > > > > > > >
>> >
>> > > > > > > > > > I would like to hear thoughts from the Apache Arrow /
>> > DataFusion
>> > > > > > > > > > community.
>> > > > > > > > > > Does this seem like a good idea?
>> > > > > > > > > >
>> >
>> > > > > > > > > > Thanks,
>> > > > > > > > > >
>> >
>> > > > > > > > > > Andy.
>> > > > > > > > > >
>> >
>> > > > > > > > > > 1 https://github.com/sqlparser-rs/sqlparser-rs
>>
>

Reply via email to