Hi Alan, I already took the freedom to add Henry to the proposal.
Btw, As I've learned that it is possible to be committer and mentor in one person, I'd also volunteer as a mentor. Best, Sebastian Am 07.04.2014 18:15 schrieb "Alan Gates" <ga...@hortonworks.com>: > Henry, definitely glad to have you on board. Ashutosh Chauhan (new to the > IPMC) has also expressed his interest to me offline in being a mentor. > I’ll add both of you the proposal. > > Alan. > > On Apr 6, 2014, at 10:34 AM, Henry Saputra <henry.sapu...@gmail.com> > wrote: > > > Hi Guys, > > > > The proposal looks great and I would love to help to sign up as a > > Mentor if you guys still have space for one. > > > > > > - Henry > > > > > > On Sun, Mar 30, 2014 at 12:14 AM, Alan Gates <ga...@hortonworks.com> > wrote: > >> I would like to propose Stratosphere as an Apache Incubator project. I > have posted the proposal to > https://wiki.apache.org/incubator/StratosphereProposal and posted the > text of the proposal below. > >> > >> Alan. > >> > >> = Stratosphere = > >> > >> == Abstract == > >> Stratosphere is an open source system for parallel data analysis. > Stratosphere deeply integrates MapReduce and database technologies to > provide expressive and optimizable programming interfaces and at the same > time efficient and scalable execution. > >> > >> == Proposal == > >> Stratosphere is an open source system for expressive, declarative, > fast, and efficient data analysis. Stratosphere combines the scalability > and programming flexibility of distributed MapReduce-like platforms with > the efficiency, out-of-core execution, and query optimization capabilities > found in parallel databases. > >> > >> == Background == > >> There is currently a need for general-purpose cluster computing > platforms that are compatible with the Hadoop ecosystem, are more > efficient, easier to use, and can support more applications than Hadoop > MapReduce, but are not restricted to a specific data model and language > (such as the relational model and a variant of SQL). Stratosphere fulfils > these needs. > >> > >> Stratosphere exposes expressive APIs in Java and Scala (conceptually > similar to Spark, Cascading, Scalding) that allow arbitrary user-defined > functions in the same language and data model that the program is written > in. Stratosphere programs pass through a cost-based optimizer that finds > the best execution path for these programs depending on the data and > cluster characteristics. The design and implementation of Stratosphere is > based on research that generalizes query optimizers in relational > databases. Stratosphere has a distributed runtime that is architected upon > the principles of parallel databases, providing true pipelining (a basis > for stream processing) and efficient out-of-core algorithms for grouping, > sorting, joining, and aggregating data. Stratosphere provides first-class > support for iterative algorithms via a built-in iterate operator, covering > Machine Learning and graph analysis use cases. It achieves performance > similar to Apache Giraph without being a specialized graph processing > system. > >> > >> Stratosphere has undergone three major releases (v0.1, v0.2, v0.4) and > some minor ones. > >> > >> == Rationale == > >> Stratosphere started out in 2008 as a research project by the Technical > University of Berlin, the Humboldt University of Berlin, and the Hasso > Plattner Institute, and has received subsequent funding from the German > Research Council, the European Institute of Innovation and Technology, the > European Commision, and industry. > >> > >> The traction of Stratosphere has by far exceeded our initial > expectations, and we are therefore seeking an organizational long-term home > for Stratosphere beyond the University walls that will house and further > encourage contributors from companies and other organizations that are > interested in Stratosphere. We believe that the Apache Software Foundation > is the ideal home for Stratosphere. Stratosphere integrates with several > existing Apache projects, such as HDFS, YARN, HBase, and Avro. The team is > familiar with the Apache processes and fully subscribes to the Apache > mission. One of the proposing members is a long-time Apache contributor and > PMC member. > >> > >> == Initial Goals == > >> * Move the existing codebase to Apache > >> * Integrate with the Apache development process > >> * Ensure all dependencies are compliant with Apache License version 2.0 > >> * Incremental development and releases per Apache guidelines > >> > >> > >> == Current Status == > >> === Meritocracy === > >> Stratosphere operated on meritocratic principles from the get go. The > initial project proposal submitted to the German Research Council > >> in 2008 stated that all code developed in the project will be released > as open source under the Apache 2 license. Currently, all the > >> discussions pertaining to Stratosphere development are public on [[ > https://github.com/stratosphere/stratosphere|GitHub]] and our [[ > https://groups.google.com/forum/#!forum/stratosphere-dev|mailing list]]. > The current incubation proposal includes the major code contributors to > Stratosphere. Several additional people have worked on the Stratosphere > codebase for research prototypes and industry use cases and would be > interested in becoming committers. We are starting with a small committer > group and we plan to add additional committers following an open > merit-based decision process during the incubation phase. > >> > >> === Community === > >> Currently, the core of Stratosphere is developed at TU Berlin, mainly > by the committers listed in this proposal. Additional people from several > Universities and companies in Europe are working with Stratosphere and are > interested in becoming committers to the project. > >> > >> During the years, Stratosphere has been adopted as a platform for > research and teaching in several Universities (TU Berlin, HU Berlin, HPI, > RWTH, Inria, KTH, U. Trento, UCSD, and others), and it is currently > witnessing its first industrial installations. We are seeing a rapidly > growing interest in Stratosphere by both startups and large companies, as > well as a growing community (our first [[ > http://stratosphere.eu/events/2013/summit.html|Stratosphere Summit]] in > November 2013 attracted over 80 participants). Stratosphere was recently > accepted as a mentoring organization in Google Summer of Code 2014. > >> > >> We believe that acceptance in the Apache Software Foundation will > consolidate the current community under one organizational umbrella, and > most importantly accelerate the growth of the community. > >> > >> === Core developers === > >> The core developers of the system are Stephan Ewen, Fabian Hueske, > Daniel Warneke, Robert Metzger, Ufuk Celebi, and Aljoscha Krettek, who are > all committers in the current proposal. > >> > >> === Alignment === > >> Stratosphere is compatible with, and related to several Apache > projects. Stratosphere re-uses parts of Apache Hadoop, in particular HDFS > and YARN, as well as Apache HBase and Apache Avro. Stratosphere is a very > good compilation target for query languages such as Apache Hive and Apache > Pig. > >> > >> == Known Risks == > >> === Orphaned Products === > >> There is strong interest in Stratosphere by several companies and > organizations, and there is currently a long-term commitment to fund > salaried developers for Stratosphere by public and private organizations in > Europe. > >> > >> === Inexperience with Open Source === > >> Sebastian Schelter is a committer and PMC member of Apache Mahout and > Apache Giraph, member of the Apache Software Foundation, member of the > Incubator PMC and project mentor for Apache Drill. Sebastian, along with > our mentors, will guide the rest of the committers that have experience > with releasing software as open source but little experience in > participating in an open source project besides Stratosphere itself. > >> > >> In mid-2013 Stratosphere transitioned from an “open source project with > publicly accessible source code” to an open source project that puts the > community first. We moved from a University-hosted git repository to > GitHub, where we discuss all issues publicly. This also includes release > planning (via GitHub’s milestone feature) and code reviews. We also moved > our build system to the publicly available Travis-CI. The mailing lists are > hosted with Google Groups, we use the public Maven repository > infrastructure of Sonatype. The source code of the www.stratosphere.euwebsite > is publicly available and is meant to be changed by external > contributors (for example for documentation purposes). > >> > >> === Homogeneous Developers === > >> Most committers in this proposal belong to the same institution (TU > Berlin). The engagement of these committers goes well beyond the necessary > development to support research, and all committers work on Stratosphere in > their free time. Several people from other institutions are working on and > are familiar with the Stratosphere codebase. We will work to attract them > as future committers during the incubation phase, following a merit-based > approach. > >> > >> === Reliance on Salaried Developers === > >> Currently, Stratosphere receives support from salaried developers, in > particular from graduate students at TU Berlin that are funded by the > German Research Council, the European Institute of Technology, and the > European Commission. These students work in their free time on Stratosphere > in addition to their employment. > >> > >> We expect that Stratosphere development will occur on both salaried and > volunteer time. We will recruit additional committers, including > non-salaried developers, and we will work to ensure that the project will > move forward independently of salaried developers. > >> > >> === Relationship with Other Apache Products === > >> Stratosphere interfaces with several existing Apache projects: Apache > HBase for storage, Apache Hadoop (HDFS for storage, YARN for resource > management, and Stratosphere contains a generic wrapper for Hadoop > MapReduce input formats), and Apache Avro (for serialization). Stratosphere > uses Apache Maven and Apache Commons libraries internally. Stratosphere can > be a great compilation target for Apache Pig and Apache Hive, although such > functionality is not yet implemented. > >> > >> Stratosphere is also related with several projects undergoing > incubation in the Apache Incubation project, such as Tez, Drill, and Spark > (graduated). While all these projects target sufficiently different spaces > and have different architectures, it would be interesting to explore code > reuse possibilities. For example, we are currently basing our design for > compiling SQL to Stratosphere on the Optiq library, also used by Apache > Drill. > >> > >> === An Excessive Fascination with the Apache Brand === > >> We believe that the Apache brand will help us attract contributors to > Stratosphere, by giving us a well-defined, transparent development process > under a known brand. At the same time, Stratosphere already has a healthy > community and current funding guarantees the further codebase development > and growth of the project for the next 3-5 years. The reason for this > proposal is not to gain publicity, but to further strengthen the longevity > of the project as explained in the Rationale section. > >> > >> == Documentation == > >> * [[https://stratosphere.eu|Project website]] > >> * [[http://stratosphere.eu/docs/0.4/|Documentation]] > >> * [[https://github.com/stratosphere/stratosphere|Codebase]] > >> * [[https://groups.google.com/forum/#!forum/stratosphere-dev|Mailinglist]] > >> > >> == Initial Source == > >> Stratosphere is hosted on [[ > https://github.com/stratosphere/stratosphere|GitHub]] . This is the > codebase that we will migrate to the Apache Foundation. The code was > previously hosted on a TU Berlin’s own git infrastructure. It has always > been Apache 2.0 licensed. > >> > >> === Source and Intellectual Property Submission Plan === > >> All initial and past committers will sign a CLA with the ASF while the > incubator proposal for Stratosphere is being discussed. All organizations > that have employed Stratosphere contributors in the past will sign a SGA. > Current contributors will sign a CCLA. All major contributors are still > active in the project. > >> > >> === External Dependencies === > >> All critical dependencies are, to the extend of our knowledge, from > other Apache projects. These include Apache Hadoop (for YARN and HDFS) and > some libraries (log4j, commons codec, junit and more). Our web frontend > uses some MIT-licensed JavaScript libraries. > >> > >> == Required Resources == > >> > >> === Mailing list === > >> We will migrate our mailing lists to the following: > >> * us...@stratosphere.incubator.apache.org > >> * d...@stratosphere.incubator.apache.org > >> * priv...@stratosphere.incubator.apache.org > >> * comm...@stratosphere.incubator.apache.org > >> > >> === Source control === > >> We would like to use Git for source control and enable GitHib mirroring > functionality, where code reviews on GitHub are automatically > >> forwarded to the developer mailing list. (See also: [[ > https://blogs.apache.org/infra/entry/improved_integration_between_apache_and] > ]) > >> > >> > >> === Issue tracking === > >> We are currently using GitHub for issue tracking. We request an > Apache-hosted JIRA, and we will import existing issues there. > >> > >> > >> == Initial committers == > >> * Stephan Ewen - stephan.e...@tu-berlin.de > >> * Fabian Hueske - fabian.hue...@tu-berlin.de > >> * Daniel Warneke - warn...@posteo.de > >> * Robert Metzger - metrob...@gmail.com > >> * Ufuk Celebi - u.cel...@fu-berlin.de > >> * Aljoscha Krettek - aljoscha.kret...@gmail.com > >> * Kostas Tzoumas - kostas.tzou...@tu-berlin.de > >> * Sebastian Schelter - s...@apache.org > >> > >> === Affiliations === > >> * Stephan Ewen (TU Berlin) > >> * Fabian Hueske (TU Berlin) > >> * Daniel Warneke (Amadeus IT Group) > >> * Robert Metzger (TU Berlin) > >> * Ufuk Celebi (FU Berlin) > >> * Aljoscha Krettek (TU Berlin) > >> * Kostas Tzoumas (TU Berlin) > >> * Sebastian Schelter (TU Berlin) > >> > >> == Sponsors == > >> === Champion === > >> Alan Gates (ga...@apache.org) > >> > >> === Nominated Mentors === > >> * Sean Owen (sro...@apache.org) (Note: Sean is an Apache member but > not currently on the IPC, he will need to request IPMC membership) > >> * Ted Dunning (tdunn...@apache.org) > >> * Owen O'Malley (omal...@apache.org) > >> > >> === Sponsoring Entity === > >> The Apache Incubator > >> > >> > >> -- > >> CONFIDENTIALITY NOTICE > >> NOTICE: This message is intended for the use of the individual or > entity to > >> which it is addressed and may contain information that is confidential, > >> privileged and exempt from disclosure under applicable law. If the > reader > >> of this message is not the intended recipient, you are hereby notified > that > >> any printing, copying, dissemination, distribution, disclosure or > >> forwarding of this communication is strictly prohibited. If you have > >> received this communication in error, please contact the sender > immediately > >> and delete it from your system. Thank You. > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >> For additional commands, e-mail: general-h...@incubator.apache.org > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >