Re: [DISCUSS] Druid incubation proposal

Brian McCallister Thu, 22 Feb 2018 09:08:59 -0800

+1 - glad to see Druid finally (hopefully) landing here!

On Wed, Feb 21, 2018 at 10:57 PM, Henning Schmiedehausen <
henn...@schmiedehausen.org> wrote:


> Woot!
>
> +1 for druid incubation.
>
> -h
>
>
>
> On Fri, Feb 16, 2018 at 12:15 PM, Gian Merlino <g...@apache.org> wrote:
>
> > Hi all,
> >
> > I would like to open up a discussion about incubating Druid at Apache.
> I've
> > included a proposal in this mail and have also posted a draft at
> > https://wiki.apache.org/incubator/DruidProposal. More information about
> > Druid is also available on our project web site at: http://druid.io/
> >
> > Thanks for your consideration!
> >
> > Gian
> >
> > = Druid Proposal =
> >
> > == Abstract ==
> >
> > Druid is a high-performance, column-oriented, distributed data store.
> >
> > == Proposal ==
> >
> > Druid is an open source data store designed for real-time exploratory
> > analytics on large data sets. Druid's key features are a column-oriented
> > storage layout, a distributed shared-nothing architecture, and ability to
> > generate and leverage indexing and caching structures. Druid is typically
> > deployed in clusters of tens to hundreds of nodes, and has the ability to
> > load data from Apache Kafka and Apache Hadoop, among other data sources.
> > Druid offers two query languages: a SQL dialect (powered by Apache
> Calcite)
> > and a JSON-over-HTTP API.
> >
> > Druid was originally developed to power a slice-and-dice analytical UI
> > built on top of large event streams. The original use case for Druid
> > targeted ingest rates of millions of records/sec, retention of over a
> year
> > of data, and query latencies of sub-second to a few seconds. Many people
> > can benefit from such capability, and many already have (see
> > http://druid.io/druid-powered.html). In addition, new use cases have
> > emerged since Druid's original development, such as OLAP acceleration of
> > data warehouse tables and more highly concurrent applications operating
> > with relatively narrower queries.
> >
> > == Background ==
> >
> > Druid is a data store designed for fast analytics. It would typically be
> > used in lieu of more general purpose query systems like Hadoop !MapReduce
> > or Spark when query latency is of the utmost importance. Druid is often
> > used as a data store for powering GUI analytical applications.
> >
> > The buzzwordy description of Druid is a high-performance,
> column-oriented,
> > distributed data store. What we mean by this is:
> >
> >  * "high performance": Druid aims to provide low query latency and high
> > ingest rates possible.
> >  * "column-oriented": Druid stores data in a column-oriented format, like
> > most other systems designed for analytics. It can also store indexes
> along
> > with the columns.
> >  * "distributed": Druid is deployed in clusters, typically of tens to
> > hundreds of nodes.
> >  * "data store": Druid loads your data and stores a copy of it on the
> > cluster's local disks (and may cache it in memory). It doesn't query your
> > data from some other storage system.
> >
> > == Rationale ==
> >
> > Druid is a mature, active project with a large number of production
> > installations, dozens of contributors to each release, and multiple
> vendors
> > offering professional support. Given Druid's strong community, its close
> > integration with many other Apache projects (such as Kafka, Hadoop, and
> > Calcite), and its pre-existing Apache-inspired governance structure, we
> > feel that Apache is the best home for the project on a long-term basis.
> >
> > == Current Status ==
> >
> > === Meritocracy ===
> > Since Druid was first open sourced the original developers have solicited
> > contributions from others, including through our blog, the project
> mailing
> > lists, and through accepting !GitHub pull requests. We have an
> > Apache-inspired governance structure with a PMC and committers, and our
> > committer ranks include a good number of people from outside the original
> > development team.
> >
> > === Community ===
> >
> > The Druid core developers have sought to nurture a community throughout
> the
> > life of the project. We use !GitHub as the focal point for bug reports
> and
> > code contributions, and the mailing lists for most other discussion. To
> try
> > to make people feel welcome, we've also spelled this out on a
> "CONTRIBUTE"
> > link from the project page: http://druid.io/community/. Today we have an
> > active contributor base (a typical release has ~40 contributors) and
> > mailing list.
> >
> > === Core Developers ===
> >
> > Druid enjoys good diversity of committer affiliation. The most active
> > developers over the past year are affiliated with four different
> companies:
> > Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are
> also
> > committers on other ASF projects as well, including Apache Airflow,
> Apache
> > Curator, and Apache Calcite. The original developers of Druid remain
> > involved in the project.
> >
> > === Alignment ===
> >
> > Druid's current governance structure is Apache-inspired with a PMC and
> > committers chosen by a meritocratic process. Additionally, Druid
> integrates
> > with a number of other Apache projects, including Kafka, Hadoop, Hive,
> > Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
> >
> > == Known Risks ==
> >
> > === Orphaned products ===
> >
> > The risk of Druid becoming orphaned is low, due to a diverse committer
> base
> > that is invested in the future of the project.
> >
> > === Inexperience with Open Source ===
> >
> > Druid's core developers have been running it as a community-oriented open
> > source project for some time now, and many of them are committers on
> other
> > open source projects as well, including Apache Airflow, Apache Curator,
> and
> > Apache Calcite.
> >
> > === Homogenous Developers ===
> >
> > Druid's current diversity of committer affiliation means that we have
> > become accustomed to working collaboratively and in the open. We hope
> that
> > a transition to the ASF helps Druid's contributor base become even more
> > diverse.
> >
> > === Reliance on Salaried Developers ===
> >
> > Druid's user base and contributor base skews heavily towards salaried
> > developers. We believe this is natural since Druid is a technology
> designed
> > to be deployed on large clusters, and due to this, tends to be deployed
> by
> > organizations rather than by individuals. Nevertheless, many current
> Druid
> > developers have continued working on the project even through job
> changes,
> > which we take to be a good sign of developer commitment and personal
> > interest.
> >
> > === Relationships with Other Apache Products ===
> >
> > Druid integrates with a number of other Apache projects. Druid internally
> > uses Calcite for SQL planning, and Curator and !ZooKeeper for
> coordination.
> > Druid can read data in Avro or Parquet format. Druid can load data from
> > streams in Kafka or from files in Hadoop. Druid integrates with Hive as
> an
> > option for SQL query acceleration. Druid data can be visualized by
> Superset
> > (incubating).
> >
> > === A Excessive Fascination with the Apache Brand ===
> >
> > Druid is a successful project with a diverse community. The main reason
> for
> > pursuing incubation is to find a stable, long term home for the project
> > with a well known governance philosophy.
> >
> > == Required Resources ==
> >
> > === Mailing lists ===
> >
> > We would like to migrate the existing Druid mailing lists from Google
> > Groups to Apache.
> >
> >  * druid-user@googlegroups -> us...@druid.incubator.apache.org
> >  * druid-development@googlegroups -> d...@druid.incubator.apache.org
> >
> > === Source control ===
> >
> > Druid development currently takes place on !GitHub. We would like to
> > continue using !GitHub, if possible, in order to preserve the workflows
> the
> > community has developed around !GitHub pull requests.
> >
> > === Issue tracking ===
> > Druid currently uses !GitHub issues for issue tracking. We would like to
> > migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
> >
> > == Documentation ==
> >
> > Druid's documentation can be found at http://druid.io/docs/latest/.
> >
> > == Initial Source ==
> >
> > Druid was initially open-sourced by Metamarkets in 2012 and has been run
> in
> > a community-governed fashion since then. The code is currently hosted at
> > https://github.com/druid-io/ and includes the following repositories:
> >
> >  * druid (primary repository)
> >  * druid-console (web console for Druid)
> >  * druid-io.github.io (source for Druid's website at http://druid.io/)
> >  * tranquility (realtime stream push client for Druid)
> >  * docker-druid (Docker image for Druid)
> >  * pydruid (Python library)
> >  * RDruid (R library)
> >  * oss-parent (Maven POM files)
> >
> > == Source and Intellectual Property Submission Plan ==
> >
> > A complete set of the open source code needs to be licensed from the
> owning
> > organization to the Foundation. Commercial legal counsel for the owning
> > organization will review the standard Foundation licensing paperwork and
> > propose any updates as needed. This license will enable Apache to
> incubate
> > and manage the Druid project moving forward.
> >
> > Other Druid paraphernalia to be transferred to Apache consists of:
> >
> >  * !GitHub organization at https://github.com/druid-io/
> >  * Twitter account at https://twitter.com/druidio
> >  * "druid.io" domain name
> >  * "Druid" trademark assignment per Foundation standard paper.  The
> > trademark assignment paperwork shall be reviewed by the owning
> > organization's commercial and IP counsel
> >  * CLAs - all rights in the code licensed above should encompass the CLAs
> > that existed between developers and owning organization
> >
> > A copyright license to the code, trademark assignment of Druid, and
> > transfer of other paraphernalia to Apache should be sufficient to cover
> all
> > rights required by Apache to operate the project.
> >
> > == External Dependencies ==
> > External dependencies distributed with Druid currently all have one of
> the
> > following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with
> one
> > exception: the optional Druid MySQL metadata store extension depends on
> > MySQL Connector/J, which is GPL licensed. Druid currently packages this
> as
> > a separate download; see our current presentation on:
> > http://druid.io/downloads.html. As part of incubation we intend to
> > determine the best strategy for handling the MySQL extension.
> >
> > == Cryptography ==
> > Not applicable.
> >
> > == Initial Committers ==
> >
> > The initial committers for incubation are the current set of committers
> on
> > Druid who have expressed interest in being involved in Apache incubation.
> > Affiliations are listed where relevant. We may seek to add other
> committers
> > during incubation; for example, we would want to add any current Druid
> > committers who express an interest after incubation begins.
> >
> >  * Charles Allen (char...@allen-net.com) (Snap)
> >  * David Lim (david.clarence....@gmail.com) (Imply)
> >  * Eric Tschetter (ched...@apache.org) (Splunk)
> >  * Fangjin Yang (f...@imply.io) (Imply)
> >  * Gian Merlino (g...@apache.org) (Imply)
> >  * Himanshu Gupta (g.himan...@gmail.com) (Oath)
> >  * Jihoon Son (jihoon...@apache.org) (Imply)
> >  * Jonathan Wei (jon....@imply.io) (Imply)
> >  * Maxime Beauchemin (maximebeauche...@gmail.com) (Lyft)
> >  * Mohamed Slim Bouguerra (slim.bougue...@gmail.com) (Hortonworks)
> >  * Nishant Bangarwa (nish...@apache.org) (Hortonworks)
> >  * Parag Jain (paragjai...@gmail.com) (Oath)
> >  * Roman Leventov (leventov...@gmail.com) (Metamarkets)
> >  * Xavier Léauté (xav...@leaute.com) (Confluent)
> >
> > == Sponsors ==
> >
> >  * Champion: Julian Hyde
> >  * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
> >  * Sponsoring entity: Apache Incubator
> >
>

Re: [DISCUSS] Druid incubation proposal

Reply via email to