+1 (non-binding) On Fri, Mar 1, 2024 at 18:58 kazuyuki tanimura <ktanim...@apple.com.invalid> wrote:
> +1 (non-binding) > > Kazu > > > On Mar 1, 2024, at 5:44 PM, L. C. Hsieh <vii...@gmail.com> wrote: > > > > +1 (binding) > > > > On Fri, Mar 1, 2024 at 1:25 PM Joris Van den Bossche > > <jorisvandenboss...@gmail.com> wrote: > >> > >> +1 (binding) > >> > >> On Fri, 1 Mar 2024 at 22:18, Sutou Kouhei <k...@clear-code.com> wrote: > >>> > >>> +1 > >>> > >>> In <CAFhtnRy2J9GCU6e2K56-KPVc=gawemuipeyhmnwcd+htkfa...@mail.gmail.com > > > >>> "[VOTE] Move Arrow DataFusion Subproject to new Top Level Apache > Project" on Fri, 1 Mar 2024 06:33:08 -0500, > >>> Andrew Lamb <al...@influxdata.com> wrote: > >>> > >>>> Hello, > >>>> > >>>> As we have discussed[1][2] I would like to vote on the proposal to > >>>> create a new Apache Top Level Project for DataFusion. The text of the > >>>> proposed resolution and background document is copy/pasted below > >>>> > >>>> If the community is in favor of this, we plan to submit the resolution > >>>> to the ASF board for approval with the next Arrow report (for the > >>>> April 2024 board meeting). > >>>> > >>>> The vote will be open for at least 7 days. > >>>> > >>>> [ ] +1 Accept this Proposal > >>>> [ ] +0 > >>>> [ ] -1 Do not accept this proposal because... > >>>> > >>>> Andrew > >>>> > >>>> [1] https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341 > >>>> [2] https://github.com/apache/arrow-datafusion/discussions/6475 > >>>> > >>>> ---------- Proposed Resolution --------- > >>>> > >>>> Resolution to Create the Apache DataFusion Project from the Apache > >>>> Arrow DataFusion Sub Project > >>>> > >>>> ============================================================= > >>>> > >>>> X. Establish the Apache DataFusion Project > >>>> > >>>> WHEREAS, the Board of Directors deems it to be in the best > >>>> interests of the Foundation and consistent with the > >>>> Foundation's purpose to establish a Project Management > >>>> Committee charged with the creation and maintenance of > >>>> open-source software related to an extensible query engine > >>>> for distribution at no charge to the public. > >>>> > >>>> NOW, THEREFORE, BE IT RESOLVED, that a Project Management > >>>> Committee (PMC), to be known as the "Apache DataFusion Project", > >>>> be and hereby is established pursuant to Bylaws of the > >>>> Foundation; and be it further > >>>> > >>>> RESOLVED, that the Apache DataFusion Project be and hereby is > >>>> responsible for the creation and maintenance of software > >>>> related to an extensible query engine; and be it further > >>>> > >>>> RESOLVED, that the office of "Vice President, Apache DataFusion" be > >>>> and hereby is created, the person holding such office to > >>>> serve at the direction of the Board of Directors as the chair > >>>> of the Apache DataFusion Project, and to have primary responsibility > >>>> for management of the projects within the scope of > >>>> responsibility of the Apache DataFusion Project; and be it further > >>>> > >>>> RESOLVED, that the persons listed immediately below be and > >>>> hereby are appointed to serve as the initial members of the > >>>> Apache DataFusion Project: > >>>> > >>>> * Andy Grove (agr...@apache.org) > >>>> * Andrew Lamb (al...@apache.org) > >>>> * Daniël Heres (dhe...@apache.org) > >>>> * Jie Wen (jake...@apache.org) > >>>> * Kun Liu (liu...@apache.org) > >>>> * Liang-Chi Hsieh (vii...@apache.org) > >>>> * Qingping Hou: (ho...@apache.org) > >>>> * Wes McKinney(w...@apache.org) > >>>> * Will Jones (wjones...@apache.org) > >>>> > >>>> RESOLVED, that the Apache DataFusion Project be and hereby > >>>> is tasked with the migration and rationalization of the Apache > >>>> Arrow DataFusion sub-project; and be it further > >>>> > >>>> RESOLVED, that all responsibilities pertaining to the Apache > >>>> Arrow DataFusion sub-project encumbered upon the > >>>> Apache Arrow Project are hereafter discharged. > >>>> > >>>> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrew Lamb > >>>> be appointed to the office of Vice President, Apache DataFusion, to > >>>> serve in accordance with and subject to the direction of the > >>>> Board of Directors and the Bylaws of the Foundation until > >>>> death, resignation, retirement, removal or disqualification, > >>>> or until a successor is appointed. > >>>> ============================================================= > >>>> > >>>> > >>>> ------- > >>>> > >>>> > >>>> Summary: > >>>> > >>>> We propose creating a new top level project, Apache DataFusion, from > >>>> an existing sub project of Apache Arrow to facilitate additional > >>>> community and project growth. > >>>> > >>>> Abstract > >>>> > >>>> Apache Arrow DataFusion[1] is a very fast, extensible query engine > >>>> for building high-quality data-centric systems in Rust, using the > >>>> Apache Arrow in-memory format. DataFusion offers SQL and Dataframe > >>>> APIs, excellent performance, built-in support for CSV, Parquet, JSON, > >>>> and Avro, extensive customization, and a great community. > >>>> > >>>> [1] https://arrow.apache.org/datafusion/ > >>>> > >>>> > >>>> Proposal > >>>> > >>>> We propose creating a new top level ASF project, Apache DataFusion, > >>>> governed initially by a subset of the Apache Arrow project’s PMC and > >>>> committers. The project’s code is in five existing git repositories, > >>>> currently governed by Apache Arrow which would transfer to the new top > >>>> level project. > >>>> > >>>> Background > >>>> > >>>> When DataFusion was initially donated to the Arrow project, it did not > >>>> have a strong enough community to stand on its own. It has since grown > >>>> significantly, and benefited immensely from being part of Arrow and > >>>> nurturing of the Apache Way, and now has a community strong enough to > >>>> stand on its own and that would benefit from focused governance > >>>> attention. > >>>> > >>>> The community has discussed this idea publicly for more than 6 months > >>>> https://github.com/apache/arrow-datafusion/discussions/6475 and > >>>> briefly on the Arrow PMC mailing list > >>>> https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs. As > >>>> of the time of this writing both had exclusively positive reactions. > >>>> > >>>> Several current members of the Arrow PMC are both active contributors > >>>> to DataFusion and understand and believe deeply in the Apache Way, and > >>>> play active governance roles in the Arrow project as PMC members and > >>>> PMC chairs, guiding the community, and releasing software versions. > >>>> With this existing governance experience and structure, the new top > >>>> level project will be able to function well immediately and > >>>> independently. > >>>> > >>>> Overview of DataFusion > >>>> > >>>> Current Status > >>>> > >>>> Meritocracy > >>>> > >>>> DataFusion has been developed as part of Apache Arrow and thus has > >>>> been operating as a meritocracy. Many of the developers of DataFusion > >>>> are Arrow PMC members or committers. The DataFusion project plans to > >>>> continue adding new PMC and committers as the project matures and > >>>> grows. > >>>> > >>>> Community > >>>> > >>>> The DataFusion development team seeks to foster the development and > >>>> user communities. We hope that becoming a separate project will help > >>>> both Arrow and DataFusion communities by being more focused. Focused > >>>> governance will make it easier to grow the community of committers and > >>>> PMC members and make the organization more clear to others. > >>>> > >>>> Alignment > >>>> > >>>> The ASF is a natural host for DataFusion given that it is already the > >>>> home of Arrow, Parquet, and other related distributed system, storage > >>>> and query execution systems. > >>>> > >>>> Project Leadership > >>>> > >>>> Proposed Initial PMC > >>>> > >>>> We propose the following people as the initial DataFusion PMC members. > >>>> This is a subset of the existing Arrow PMC members who contribute to > >>>> DataFusion https://people.apache.org/phonebook.html?unix=arrow > >>>> > >>>> Andy Grove (agrove): Arrow PMC Chair > >>>> Andrew Lamb (alamb): Arrow PMC, past Arrow PMC Chair > >>>> Daniël Heres (dheres) Arrow PMC > >>>> Jie Wen (jakevin): Arrow PMC, Doris Committer > >>>> Kun Liu (liukun): Arrow PMC, IoTDB PMC, TSFile PMC > >>>> Liang-Chi Hsieh (viirya): Arrow PMC, Spark PMC > >>>> Qingping Hou: (houqp): Arrow PMC > >>>> Wes McKinney(wesm): Arrow PMC, ASF Member > >>>> Will Jones (wjones127): Arrow PMC > >>>> > >>>> We’d like to propose Andrew Lamb as the initial Chair, (and thus ASF > >>>> VP) for the DataFusion project. > >>>> > >>>> Affiliations > >>>> > >>>> Andy Grove (agrove): NVidia > >>>> Andrew Lamb (alamb): InfluxData > >>>> Daniël Heres (dheres): Coralogix > >>>> Jie Wen (jakevin): SelectDB > >>>> Kun Liu (liukun): Ebay > >>>> Liang-Chi Hsieh (viirya): Apple > >>>> Qingping Hou: (houqp): Scribd > >>>> Wes McKinney(wesm): Posit > >>>> Will Jones (wjones127): LanceDB > >>>> > >>>> Proposed Initial Committers > >>>> > >>>> In addition to the PMC, we propose the following people as the initial > >>>> DataFusion committers. This is a subset of the existing Arrow > >>>> committers who contribute to DataFusion > >>>> https://people.apache.org/phonebook.html?unix=arrow > >>>> > >>>> akurmustafa Mustafa Akur (Synnada) > >>>> avantgardner Brent Gardner (Coralogix) > >>>> comphead Oleks V. (Unaffiliated) > >>>> jayzhan Jay Zhan (Unaffiliated) > >>>> jeffreyvo Jeffry Vo (Unaffiliated) > >>>> jiayuliu Liu Jiayu (Airbnb) > >>>> mete Metehan Yildirim (Synnada) > >>>> mingmwang Wang Mingming (Ebay) > >>>> mneumann Marco Neumann (InfluxData) > >>>> nju_yaho Zhong Yanghong (Ebay) > >>>> ozankabak Mehmet Ozan Kabak (Synnada) > >>>> paddyhoran Paddy Horan (Assured Allies) > >>>> rdettai Rémi Dettai (Cloudfuse) > >>>> sunchao Chao Sun (Apple) > >>>> thinkharderdev Daniel Harris (Coralogix) > >>>> tustvold Raphael Taylor-Davies (InfluxData) > >>>> wayne Ruihang Xia (Greptime) > >>>> xudong963 Xudong Wang (ByteDance) > >>>> yjshen Yijie Shen (Space and Time) > >>>> yangjiang Yang Jiang (ebay) > >>>> > >>>> > >>>> Risk Assessments > >>>> > >>>> Naming / Trademarks > >>>> > >>>> As a sub-project of Arrow, the DataFusion name has been used for over > >>>> 4 years without any known issues. A podling name search did not turn > >>>> up any concerns and was approved: > >>>> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 > >>>> > >>>> Legal / IP Clearance > >>>> > >>>> All DataFusion code has either been donated to the Arrow project with > >>>> appropriate IP clearance or has been developed directly under ASF > >>>> processes and procedures. Thus creating a new top level project poses > >>>> no new Legal or IP risks. > >>>> > >>>> Code Extraction > >>>> > >>>> The relevant code is already in 5 separate repositories: > >>>> https://github.com/apache/arrow-datafusion/ > >>>> https://github.com/apache/arrow-datafusion-python > >>>> https://github.com/apache/arrow-ballista > >>>> https://github.com/apache/arrow-ballista-python > >>>> https://github.com/apache/arrow-datafusion-comet > >>>> > >>>> We foresee no issues with code extraction and propose these > >>>> repositories be renamed to reflect top level projects > >>>> > >>>> Note: https://github.com/apache/arrow-rs, the Rust implementation of > >>>> Arrow, would remain part of the Arrow project. > >>>> > >>>> Orphaned Products > >>>> > >>>> DataFusion is known to be used in many open source and commercial > >>>> projects > https://arrow.apache.org/datafusion/user-guide/introduction.html#known-users > , > >>>> has had multiple commits daily for several years, and its adoption and > >>>> number of contributors appears to be growing. We do not foresee the > >>>> project being orphaned in the next several years. > >>>> > >>>> Inexperience with Open Source > >>>> > >>>> The proposed PMC has extensive experience with Apache Arrow and other > >>>> Apache projects, and includes PMC members, PMC chairs and an ASF > >>>> Member. The DataFusion PMC and more experienced committers will > >>>> continue to coach new community members who may be less familiar with > >>>> the Apache Way. > >>>> > >>>> Homogeneous Developers > >>>> > >>>> The 9 proposed PMC members are from 9 different employers and the > >>>> proposed committers are similarly distributed across affiliations. No > >>>> specific entity employs more than 3 total proposed developers. > >>>> > >>>> Reliance on Salaried Developers > >>>> > >>>> A substantial amount of work on DataFusion has been by salaried > >>>> developers, but it also has a long tradition of attracting > >>>> contributions from students and hobbyists and we plan no changes in > >>>> contribution structure. > >>>> > >>>> Relationships with Other Apache Products > >>>> > >>>> DataFusion will obviously have a strong relationship with the Arrow > >>>> project given the overlap in people. We don’t foresee close > >>>> collaboration with other projects at this time. > >>>> > >>>> Cryptography > >>>> > >>>> DataFusion does not directly support encryption and there are no > >>>> near-term plans to add support for encryption. Users who need this > >>>> functionality can use the extension APIs. > >>>> > >>>> Required Resources > >>>> > >>>> Mailing Lists > >>>> > >>>> - priv...@datafusion.apache.org for private PMC discussions (with > >>>> moderated subscriptions) > >>>> - d...@datafusion.apache.org > >>>> - comm...@datafusion.apache.org > >>>> - u...@datafusion.apache.org > >>>> > >>>> Version Control > >>>> > >>>> We propose to continue to use git for source control and github for > >>>> hosting and testing resources. > >>>> > >>>> We also need to rename the github repositories to reflect the new top > >>>> level names: > >>>> > >>>> https://github.com/apache/arrow-datafusion/ → apache/datafusion > >>>> https://github.com/apache/arrow-datafusion-python → > apache/datafusion-python > >>>> https://github.com/apache/arrow-ballista → apache/datafusion-ballista > >>>> https://github.com/apache/arrow-ballista-python → > >>>> apache/datafusion-ballista-python > >>>> https://github.com/apache/arrow-datafusion-comet → > apache/datafusion-comet > >>>> > >>>> > >>>> > >>>> Issue Tracking > >>>> > >>>> DataFusion would continue to use github for its issue tracking and > >>>> communications > >>>> > >>>> Other Resources > >>>> > >>>> The existing repositories already make use of existing Apache > >>>> infrastructure, and we expect no change in the initial resource usage. > >>>> As the project continues to grow, we expect continued infrastructure > >>>> demand growth. > >>>> > >>>> > >>>> FAQ: Has a sub project been promoted to a top level project before? > >>>> > >>>> Yes, and it appears to happen commonly. The Arrow project itself was > >>>> created as a top level project from work that started in Apache Drill, > >>>> and there are many sub projects of Hadoop that spun out as their own > >>>> top level projects such as Mahout, Avro and HBase: > >>>> > https://news.apache.org/foundation/entry/the_apache_software_foundation_announces4 > >>>> > >>>> > >>>> > >>>> Related material: > >>>> Name search request / research for DataFusion: > >>>> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 > >>>> Discussion about this proposal on the arrow mailing list: > >>>> https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341 > >>>> Discussion about which repositories on the arrow mailing list: > >>>> https://lists.apache.org/thread/ob3n0d9ky0bgrryl3xn39w9k566bq00q > >>>> Discussion about initial PMC on the arrow mailing list: > >>>> https://lists.apache.org/thread/pymrzcdw4qdptvby85f69rg3pcckl15b > >>>> Discussion in github about creating a new DataFusion top level > >>>> project: https://github.com/apache/arrow-datafusion/discussions/6475 > >>>> Discussion about graduating on incubator list: > >>>> https://lists.apache.org/thread/r4n73pmms1lv0jbohyx1o1z13d615t99 > >>>> Original Proposal for the Arrow project: > >>>> https://lists.apache.org/thread/x2qzdwglm8pkqp9gv03bbgw17khl7pq3 > >