Re: [VOTE] Accept Druid into the Apache Incubator

Chris Mattmann Thu, 22 Feb 2018 11:05:45 -0800

+1 binding.

Thanks,
Chris




On 2/22/18, 11:04 AM, "Julian Hyde" <[email protected]> wrote:

    Hi all,
    
    After some discussion on the Druid proposal[1], I'd like to
    start a vote on accepting Druid into the Apache Incubator,
    per the ASF policy[2] and voting rules[3].
    
    A vote for accepting a new Apache Incubator podling is a
    majority vote for which only Incubator PMC member votes are
    binding. Votes from other people are also welcome as an
    indication of people's enthusiasm (or lack thereof).
    
    Please do not use this VOTE thread for discussions.  If
    needed, start a new thread instead.
    
    This vote will run for at least 72 hours. Please VOTE as
    follows:
     [ ] +1 Accept Druid into the Apache Incubator
     [ ] +0 Abstain
     [ ] -1 Do not accept Druid into the Apache Incubator
            because ...
    
    The proposal is listed below, but you can also access it on
    the wiki[4].
    
    Julian
    
    [1] 
https://lists.apache.org/thread.html/b95f90a30b6e8587e9b108f368b07c1b3e23e25ca592448d9c9f81e2@%3Cgeneral.incubator.apache.org%3E
    
    [2] 
https://incubator.apache.org/policy/incubation.html#approval_of_proposal_by_sponsor
    
    [3] http://www.apache.org/foundation/voting.html
    
    [4] https://wiki.apache.org/incubator/DruidProposal
    
    
    
    
    
    = Druid Proposal =
    
    == Abstract ==
    
    Druid is a high-performance, column-oriented, distributed
    data store.
    
    == Proposal ==
    
    Druid is an open source data store designed for real-time
    exploratory analytics on large data sets. Druid's key
    features are a column-oriented storage layout, a distributed
    shared-nothing architecture, and ability to generate and
    leverage indexing and caching structures. Druid is typically
    deployed in clusters of tens to hundreds of nodes, and has
    the ability to load data from Apache Kafka and Apache
    Hadoop, among other data sources. Druid offers two query
    languages: a SQL dialect (powered by Apache Calcite) and a
    JSON-over-HTTP API.
    
    Druid was originally developed to power a slice-and-dice
    analytical UI built on top of large event streams. The
    original use case for Druid targeted ingest rates of
    millions of records/sec, retention of over a year of data,
    and query latencies of sub-second to a few seconds. Many
    people can benefit from such capability, and many already
    have (see http://druid.io/druid-powered.html). In addition,
    new use cases have emerged since Druid's original
    development, such as OLAP acceleration of data warehouse
    tables and more highly concurrent applications operating
    with relatively narrower queries.
    
    == Background ==
    
    Druid is a data store designed for fast analytics. It would
    typically be used in lieu of more general purpose query
    systems like Hadoop MapReduce or Spark when query latency is
    of the utmost importance. Druid is often used as a data
    store for powering GUI analytical applications.
    
    The buzzwordy description of Druid is a high-performance,
    column-oriented, distributed data store. What we mean by
    this is:
    
    * "high performance": Druid aims to provide low query
      latency and high ingest rates possible.
    * "column-oriented": Druid stores data in a column-oriented
      format, like most other systems designed for analytics. It
      can also store indexes along with the columns.
    * "distributed": Druid is deployed in clusters, typically of
      tens to hundreds of nodes.
    * "data store": Druid loads your data and stores a copy of
      it on the cluster's local disks (and may cache it in
      memory). It doesn't query your data from some other
      storage system.
    
    == Rationale ==
    
    Druid is a mature, active project with a large number of
    production installations, dozens of contributors to each
    release, and multiple vendors offering professional
    support. Given Druid's strong community, its close
    integration with many other Apache projects (such as Kafka,
    Hadoop, and Calcite), and its pre-existing Apache-inspired
    governance structure, we feel that Apache is the best home
    for the project on a long-term basis.
    
    == Current Status ==
    
    === Meritocracy ===
    
    Since Druid was first open sourced the original developers
    have solicited contributions from others, including through
    our blog, the project mailing lists, and through accepting
    GitHub pull requests. We have an Apache-inspired governance
    structure with a PMC and committers, and our committer ranks
    include a good number of people from outside the original
    development team.
    
    === Community ===
    
    The Druid core developers have sought to nurture a community
    throughout the life of the project. We use GitHub as the
    focal point for bug reports and code contributions, and the
    mailing lists for most other discussion. To try to make
    people feel welcome, we've also spelled this out on a
    "CONTRIBUTE" link from the project page:
    http://druid.io/community/. Today we have an active
    contributor base (a typical release has ~40 contributors)
    and mailing list.
    
    === Core Developers ===
    
    Druid enjoys good diversity of committer affiliation. The
    most active developers over the past year are affiliated
    with four different companies: Imply, Metamarkets, Yahoo,
    and Hortonworks. Many Druid committers are also committers
    on other ASF projects as well, including Apache Airflow,
    Apache Curator, and Apache Calcite. The original developers
    of Druid remain involved in the project.
    
    === Alignment ===
    
    Druid's current governance structure is Apache-inspired with
    a PMC and committers chosen by a meritocratic
    process. Additionally, Druid integrates with a number of
    other Apache projects, including Kafka, Hadoop, Hive,
    Calcite, Superset (incubating), Spark, Curator, and
    ZooKeeper.
    
    == Known Risks ==
    
    === Orphaned products ===
    
    The risk of Druid becoming orphaned is low, due to a diverse
    committer base that is invested in the future of the
    project.
    
    === Inexperience with Open Source ===
    
    Druid's core developers have been running it as a
    community-oriented open source project for some time now,
    and many of them are committers on other open source
    projects as well, including Apache Airflow, Apache Curator,
    and Apache Calcite.
    
    === Homogenous Developers ===
    
    Druid's current diversity of committer affiliation means
    that we have become accustomed to working collaboratively
    and in the open. We hope that a transition to the ASF helps
    Druid's contributor base become even more diverse.
    
    === Reliance on Salaried Developers ===
    
    Druid's user base and contributor base skews heavily towards
    salaried developers. We believe this is natural since Druid
    is a technology designed to be deployed on large clusters,
    and due to this, tends to be deployed by organizations
    rather than by individuals. Nevertheless, many current Druid
    developers have continued working on the project even
    through job changes, which we take to be a good sign of
    developer commitment and personal interest.
    
    === Relationships with Other Apache Products ===
    
    Druid integrates with a number of other Apache
    projects. Druid internally uses Calcite for SQL planning,
    and Curator and ZooKeeper for coordination.  Druid can read
    data in Avro or Parquet format. Druid can load data from
    streams in Kafka or from files in Hadoop. Druid integrates
    with Hive as an option for SQL query acceleration. Druid
    data can be visualized by Superset (incubating).
    
    === A Excessive Fascination with the Apache Brand ===
    
    Druid is a successful project with a diverse community. The
    main reason for pursuing incubation is to find a stable,
    long term home for the project with a well known governance
    philosophy.
    
    == Required Resources ==
    
    === Mailing lists ===
    
    We would like to migrate the existing Druid mailing lists
    from Google Groups to Apache.
    
    * druid-user@googlegroups -> [email protected]
    * druid-development@googlegroups -> [email protected]
    
    === Source control ===
    
    Druid development currently takes place on GitHub. We would
    like to continue using GitHub, if possible, in order to
    preserve the workflows the community has developed around
    GitHub pull requests.
    
    === Issue tracking ===
    
    Druid currently uses GitHub issues for issue tracking. We
    would like to migrate to Apache JIRA at
    http://issues.apache.org/jira/browse/DRUID.
    
    == Documentation ==
    
    Druid's documentation can be found at
    http://druid.io/docs/latest/.
    
    == Initial Source ==
    
    Druid was initially open-sourced by Metamarkets in 2012 and
    has been run in a community-governed fashion since then. The
    code is currently hosted at https://github.com/druid-io/ and
    includes the following repositories:
    
    * druid (primary repository)
    * druid-console (web console for Druid)
    * druid-io.github.io (source for Druid's website at
      http://druid.io/)
    * tranquility (realtime stream push client for Druid)
    * docker-druid (Docker image for Druid)
    * pydruid (Python library)
    * RDruid (R library)
    * oss-parent (Maven POM files)
    
    == Source and Intellectual Property Submission Plan ==
    
    A complete set of the open source code needs to be licensed
    from the owning organization to the Foundation. Commercial
    legal counsel for the owning organization will review the
    standard Foundation licensing paperwork and propose any
    updates as needed. This license will enable Apache to
    incubate and manage the Druid project moving forward.
    
    Other Druid paraphernalia to be transferred to Apache
    consists of:
    
    * GitHub organization at https://github.com/druid-io/
    * Twitter account at https://twitter.com/druidio
    * "druid.io" domain name
    * "Druid" trademark assignment per Foundation standard
      paper. The trademark assignment paperwork shall be
      reviewed by the owning organization's commercial and IP
      counsel
    * CLAs - all rights in the code licensed above should
      encompass the CLAs that existed between developers and
      owning organization
    
    A copyright license to the code, trademark assignment of
    Druid, and transfer of other paraphernalia to Apache should
    be sufficient to cover all rights required by Apache to
    operate the project.
    
    == External Dependencies ==
    
    External dependencies distributed with Druid currently all
    have one of the following Category A or B licenses: ASL,
    BSD, CDDL, EPL, MIT, MPL; with one exception: the optional
    Druid MySQL metadata store extension depends on MySQL
    Connector/J, which is GPL licensed. Druid currently packages
    this as a separate download; see our current presentation
    on: http://druid.io/downloads.html. As part of incubation we
    intend to determine the best strategy for handling the MySQL
    extension.
    
    == Cryptography ==
    
    Not applicable.
    
    == Initial Committers ==
    
    The initial committers for incubation are the current set of
    committers on Druid who have expressed interest in being
    involved in Apache incubation.  Affiliations are listed
    where relevant. We may seek to add other committers during
    incubation; for example, we would want to add any current
    Druid committers who express an interest after incubation
    begins.
    
    * Charles Allen ([email protected]) (Snap)
    * David Lim ([email protected]) (Imply)
    * Eric Tschetter ([email protected]) (Splunk)
    * Fangjin Yang ([email protected]) (Imply)
    * Gian Merlino ([email protected]) (Imply)
    * Himanshu Gupta ([email protected]) (Oath)
    * Jihoon Son ([email protected]) (Imply)
    * Jonathan Wei ([email protected]) (Imply)
    * Maxime Beauchemin ([email protected]) (Lyft)
    * Mohamed Slim Bouguerra ([email protected]) (Hortonworks)
    * Nishant Bangarwa ([email protected]) (Hortonworks)
    * Parag Jain ([email protected]) (Oath)
    * Roman Leventov ([email protected]) (Metamarkets)
    * Xavier Léauté ([email protected]) (Confluent)
    
    == Sponsors ==
    
    * Champion: Julian Hyde
    * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
    * Sponsoring entity: Apache Incubator
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
    
    



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [VOTE] Accept Druid into the Apache Incubator

Reply via email to