Thank you QP Andrew
On Sun, Nov 14, 2021 at 5:02 PM QP Hou <houqp....@gmail.com> wrote: > Thanks Jiayu, Benson, Micah and Andrew for your input on this. I have > created an unofficial Github org [1] as a quick and dirty experiment > for something like spark-packages.org. We should make it clear that > code developed in this org will still need to go through the donation > process in order to get into the ASF org. > > [1]: https://github.com/datafusion-contrib > > On Mon, Nov 8, 2021 at 3:12 AM Andrew Lamb <al...@influxdata.com> wrote: > > > > I think a separate non-ASF organization, with a central list of > extensions > > like spark-packages.org sounds like a good idea to me. > > > > On Sun, Nov 7, 2021 at 1:34 PM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > > > I'll preface this with not being an expert on these matters but this > is my > > > impression. > > > > > > > > > > Therefore, I am proposing that we create an unofficial shared Github > > > > organization to host these Datafusion contrib type projects that are > > > > only maintained by non-PMC community members. > > > > > > > > > I think as long as this is hosted outside of the Apache github > > > organization, this seems fine. I think being careful around trade-mark > > > issues and making it clear it isn't officially part of the Apache > > > DataFusion project are the things to be careful about. FWIW, I seem to > > > recall this type of model was something proposed in Spark and there was > > > some tension at the time with branding of the project. It looks like > Spark > > > has settled on having a central site <https://spark-packages.org/> > [1][2] > > > for linking additional modules and they don't have a common namespace. > > > > > > > > > > Am I curious if this is something that could be done under the Apache > > > > governance model? My main goal is to create an unofficial incubator > > > > type space for community members to develop and collaborate on > > > > extensions that may or may not be adopted as official extensions in > > > > the future. > > > > > > > > > My limited understanding is either something is governed by the ASF > rules > > > (i.e. PMC/Committers officially recognized by the apache foundation, > along > > > with release requirements) or it isn't, there really isn't a half-way > thing > > > here from the ASF perspective. Independent projects can choose > ASF-like > > > policies and manage themselves in this manner. The incubator program > at the > > > ASF is for projects that might or might not have sustained interest to > > > continue (but my understanding is incubation follows all the process > of a > > > normal top-level Apache project). Any code developed outside of ASF > > > governance needs to go through the donation process (IP Clearance, > etc) to > > > be moved into ASF repos, even if it is developed by PMC > members/committers > > > (see prior discussions on Arrow2 in Rust and the Julia libraries). > > > > > > Cheers, > > > Micah > > > > > > [1] https://spark.apache.org/contributing.html > > > [2] https://spark-packages.org/ > > > > > > > > > On Sun, Nov 7, 2021 at 2:31 AM Benson Muite < > benson_mu...@emailplus.org> > > > wrote: > > > > > > > A community owned GitHub organization would be helpful. Maybe for all > > > > other Arrow related projects not just Datafusion. This would make > them > > > > easier to find, and for community members to contribute. It could > also > > > > include a listing of relevant projects elsewhere. > > > > > > > > On 11/7/21 9:40 AM, Jiayu Liu wrote: > > > > > FWIW if there's a way to contribute code pertaining to datafusion > I can > > > > > contribute my version of Java bindings to it. > > > > > > > > > > IMO having a central place (instead of linking) for all bindings, > 3rd > > > > > libraries, etc. for datafusion would mean more synergy across > different > > > > > languages but I won't go as far as a monorepo because the CI/CD > process > > > > > and release process are unlikely to benefit from it. Maybe a > community > > > > > owned GitHub org? > > > > > > > > > > On 2021/11/07 00:52:49 QP Hou wrote: > > > > >> Hi all, > > > > >> > > > > >> I would like to propose a new and more community friendly > governance > > > > >> model for community contributed and maintained extensions for the > > > > >> datafusion project. > > > > >> > > > > >> Over the last year, many datafusion extensions have been proposed > and > > > > >> created by the community including the java binding, s3 and > hdfs[1] > > > > >> object storage implementations, etc. Right now these code are or > will > > > > >> be hosted in individual github namespaces due to the following > > > > >> reasons: > > > > >> > > > > >> * Most of these extensions are not considered part of the > Datafusion > > > > >> core, so the current maintainers prefer to not have them managed > in > > > > >> the main repository. The current python binding and ballista code > base > > > > >> is already adding a decent amount of overhead to our development > > > > >> process. Adding more dependent crates will slow us down further > > > > >> without much upside. > > > > >> > > > > >> * Considering the overhead of the official Apache release process, > > > > >> current Datafusion PMCs don't have the bandwidth to manage > individual > > > > >> releases for these extensions. All of the authors of these > extensions > > > > >> are not Arrow PMC members, so they won't have the access to drive > the > > > > >> Apache releases by themselves. > > > > >> > > > > >> Therefore, I am proposing that we create an unofficial shared > Github > > > > >> organization to host these Datafusion contrib type projects that > are > > > > >> only maintained by non-PMC community members. I think this is > strictly > > > > >> better than hosting these extensions projects in personal github > > > > >> namespaces. If any of these extensions end up getting significant > > > > >> involvements or interests from Datafusion committers, then we can > > > > >> promote them into official projects and provide official Apache > style > > > > >> release support. > > > > >> > > > > >> Other alternatives I have considered are: > > > > >> > > > > >> * Keep these projects under personal namespaces and only link > them in > > > > >> Datafusion's documentation. > > > > >> > > > > >> * Manage these extensions using experimental repos. But as far as > I > > > > >> know, the code owners still need to be a PMC member in order to > > > > >> perform crates.io releases and it's not intended for long running > > > > >> projects without no goal for eventual archival. > > > > >> > > > > >> * Create a dedicated mono repo named apache/datafusion-contrib to > host > > > > >> these extensions. However, this approach also requires PMC > members to > > > > >> get involved for crates.io releases if I understand it correctly. > > > > >> > > > > >> Am I curious if this is something that could be done under the > Apache > > > > >> governance model? My main goal is to create an unofficial > incubator > > > > >> type space for community members to develop and collaborate on > > > > >> extensions that may or may not be adopted as official extensions > in > > > > >> the future. > > > > >> > > > > >> [1]: https://github.com/apache/arrow-datafusion/pull/1223 > > > > >> > > > > >> Thanks, > > > > >> QP > > > > >> > > > > > > > > > > > > > > > > >