I think a separate non-ASF organization, with a central list of extensions
like spark-packages.org sounds like a good idea to me.

On Sun, Nov 7, 2021 at 1:34 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> I'll preface this with not being an expert on these matters but this is my
> impression.
>
>
> > Therefore, I am proposing that we create an unofficial shared Github
> > organization to host these Datafusion contrib type projects that are
> > only maintained by non-PMC community members.
>
>
> I think as long as this is hosted outside of the Apache github
> organization, this seems fine.  I think being careful around trade-mark
> issues and making it clear it isn't officially part of the Apache
> DataFusion project are the things to be careful about.  FWIW, I seem to
> recall this type of model was something proposed in Spark and there was
> some tension at the time with branding of the project.  It looks like Spark
> has settled on having a central site <https://spark-packages.org/> [1][2]
> for linking additional modules and they don't have a common namespace.
>
>
> > Am I curious if this is something that could be done under the Apache
> > governance model? My main goal is to create an unofficial incubator
> > type space for community members to develop and collaborate on
> > extensions that may or may not be adopted as official extensions in
> > the future.
>
>
> My limited understanding is either something is governed by the ASF rules
> (i.e. PMC/Committers officially recognized by the apache foundation, along
> with release requirements) or it isn't, there really isn't a half-way thing
> here from the ASF perspective.  Independent projects can choose ASF-like
> policies and manage themselves in this manner. The incubator program at the
> ASF is for projects that might or might not have sustained interest to
> continue (but my understanding is incubation follows all the process of a
> normal top-level Apache project).  Any code developed outside of ASF
> governance needs to go through the donation process (IP Clearance, etc) to
> be moved into ASF repos, even if it is developed by PMC members/committers
> (see prior discussions on Arrow2 in Rust and the Julia libraries).
>
> Cheers,
> Micah
>
> [1] https://spark.apache.org/contributing.html
> [2] https://spark-packages.org/
>
>
> On Sun, Nov 7, 2021 at 2:31 AM Benson Muite <benson_mu...@emailplus.org>
> wrote:
>
> > A community owned GitHub organization would be helpful. Maybe for all
> > other Arrow related projects not just Datafusion. This would make them
> > easier to find, and for community members to contribute. It could also
> > include a listing of relevant projects elsewhere.
> >
> > On 11/7/21 9:40 AM, Jiayu Liu wrote:
> > > FWIW if there's a way to contribute code pertaining to datafusion I can
> > > contribute my version of Java bindings to it.
> > >
> > > IMO having a central place (instead of linking) for all bindings, 3rd
> > > libraries, etc. for datafusion would mean more synergy across different
> > > languages but I won't go as far as a monorepo because the CI/CD process
> > > and release process are unlikely to benefit from it. Maybe a community
> > > owned GitHub org?
> > >
> > > On 2021/11/07 00:52:49 QP Hou wrote:
> > >> Hi all,
> > >>
> > >> I would like to propose a new and more community friendly governance
> > >> model for community contributed and maintained extensions for the
> > >> datafusion project.
> > >>
> > >> Over the last year, many datafusion extensions have been proposed and
> > >> created by the community including the java binding, s3 and hdfs[1]
> > >> object storage implementations, etc. Right now these code are or will
> > >> be hosted in individual github namespaces due to the following
> > >> reasons:
> > >>
> > >> * Most of these extensions are not considered part of the Datafusion
> > >> core, so the current maintainers prefer to not have them managed in
> > >> the main repository. The current python binding and ballista code base
> > >> is already adding a decent amount of overhead to our development
> > >> process. Adding more dependent crates will slow us down further
> > >> without much upside.
> > >>
> > >> * Considering the overhead of the official Apache release process,
> > >> current Datafusion PMCs don't have the bandwidth to manage individual
> > >> releases for these extensions. All of the authors of these extensions
> > >> are not Arrow PMC members, so they won't have the access to drive the
> > >> Apache releases by themselves.
> > >>
> > >> Therefore, I am proposing that we create an unofficial shared Github
> > >> organization to host these Datafusion contrib type projects that are
> > >> only maintained by non-PMC community members. I think this is strictly
> > >> better than hosting these extensions projects in personal github
> > >> namespaces. If any of these extensions end up getting significant
> > >> involvements or interests from Datafusion committers, then we can
> > >> promote them into official projects and provide official Apache style
> > >> release support.
> > >>
> > >> Other alternatives I have considered are:
> > >>
> > >> * Keep these projects under personal namespaces and only link them in
> > >> Datafusion's documentation.
> > >>
> > >> * Manage these extensions using experimental repos. But as far as I
> > >> know, the code owners still need to be a PMC member in order to
> > >> perform crates.io releases and it's not intended for long running
> > >> projects without no goal for eventual archival.
> > >>
> > >> * Create a dedicated mono repo named apache/datafusion-contrib to host
> > >> these extensions. However, this approach also requires PMC members to
> > >> get involved for crates.io releases if I understand it correctly.
> > >>
> > >> Am I curious if this is something that could be done under the Apache
> > >> governance model? My main goal is to create an unofficial incubator
> > >> type space for community members to develop and collaborate on
> > >> extensions that may or may not be adopted as official extensions in
> > >> the future.
> > >>
> > >> [1]: https://github.com/apache/arrow-datafusion/pull/1223
> > >>
> > >> Thanks,
> > >> QP
> > >>
> > >
> >
> >
>

Reply via email to