I would +1 to have as a separate project instead of pushing under Hadoop. When a project can sustain by having potential to build community on its own and can run logically as independent module, I feel that’s good enough to start as separate project.
I could not recall the discussions on removal of Vaidya package from Hadoop. If someone remembers, it would be great to know the reasons for removal of that package from Hadoop base. [ probably at the time of mavenization ? ] Regards, Uma On 3/6/18, 3:17 PM, "md...@cloudera.com on behalf of Mike Drob" <md...@cloudera.com on behalf of md...@apache.org> wrote: Why does Dr. Elephant make sense as a separate project instead of contributing to Hadoop directly? What is the relationship between Dr. Elephant and the (now seemingly defunct) Hadoop Vaidya? On Tue, Mar 6, 2018 at 5:08 PM, Carl Steinbach <c...@apache.org> wrote: > Hi, > > I would like to propose Dr. Elephant as an Apache Incubator > project. The proposal is available as a draft at > https://wiki.apache.org/incubator/DrElephantProposal. I have also > included the text of the proposal below. > > Any feedback from the community is much appreciated. > > Thanks. > > - Carl > > > = ABSTRACT = > > Dr. Elephant is a performance monitoring and tuning service for Apache > Hadoop and Apache Spark jobs and workflows. While the system is > primarily aimed at developers, we have discovered that it is also > popular with cluster operators who use it to monitor the health of > workloads running on their clusters. > > = PROPOSAL = > > Dr. Elephant was open sourced by LinkedIn in 2016 and is currently > hosted on GitHub. We believe that being a part of the Apache Software > Foundation will improve the diversity and help form a strong community > around the project. > > LinkedIn submits this proposal to donate the code base to the Apache > Software Foundation. The code is already under Apache License 2.0. > Both the source code and documentation are hosted on Github. > > * Code: http://github.com/linkedin/dr-elephant > * Documentation: https://github.com/linkedin/dr-elephant/wiki > > = Background = > > Dr. Elephant is a service that helps users of Apache Hadoop and Apache > Spark understand, analyze, and improve the performance of jobs and > workflows running on their clusters. It automatically gathers metrics, > performs analysis, and presents the results along with actionable > advice. The goal of the project is to improve developer productivity > and increase cluster efficiency by reducing the time and domain > expertise required to diagnose and treat sick jobs. It analyzes Hadoop > and Spark jobs using a set of configurable, extensible, rule-based > heuristics that provide insights on job performance, and then uses > this information to provide recommendations about how to tune jobs to > make them run more efficiently. > > Dr. Elephant was open sourced in 2016 after two years of > successful production use at Linkedin. In the time since many new > features have been added including support for the Oozie and Airflow > workflow schedulers, improved metrics, and enhancements to the Spark > history fetcher and Spark heuristics. It is also important to note > that many of these contributions came from developers outside of > LinkedIn. We have also been happy to see that many people have been > able to benefit from running Dr. Elephant including companies like > Airbnb, Foursquare, Hulu, and Pinterest. > > = RATIONALE = > > Dr. Elephant's entry to the ASF will be beneficial to both the > Dr. Elephant and Apache communities. Dr. Elephant has greatly > benefited from its open source roots. Its community and adoption has > grown greatly as a result. More importantly, the feedback from the > community, whether through interactions at meetups or through the > mailing list, have allowed for a rich exchange of ideas. We believe a > partnership with the Apache Foundation is the logical next step. The > Dr. Elephant community will greatly benefit from the established > development and consensus processes that have worked well for other > projects. The Apache process has served many other open source > projects well and we believe that the Dr. Elephant community will > greatly benefit from these practices as well. > > = CURRENT STATUS = > > Dr. Elephant is currently open sourced under the Apache License > Version 2.0 and is available at github.com/linkedin/dr-elephant. All > of the development is done using GitHub Pull Requests. > > We are aware of at least 10 organizations that are running > Dr. Elephant, and many of these organizations have also contributed > code. Dr. Elephant has also been integrated into commercial products > such as Pepperdata's Application Profiler. > > = INITIAL GOALS = > > Our initial goals are as follows: > > * Migrate the existing codebase to Apache > * Study and integrate with the Apache development process > * Ensure all dependencies are compliant with Apache License version 2.0 > * Incremental development and releases per Apache guidelines > * Diversify the set of core developers and committers > > = MERITOCRACY = > > Following the Apache meritocracy model, we intend to build an open and > diverse community around Dr. Elephant. We will encourage the community to > contribute to discussions and the codebase. > > = COMMUNITY = > > The need for a simple and understandable performance monitoring and > tuning service for Hadoop and Spark is tremendous. Dr. Elephant is > currently being used by at least 10 organizations worldwide (some > examples are listed here). We hope to extend the contributor base > significantly by bringing Dr. Elephant into Apache. > > = CORE DEVELOPERS = > > Dr. Elephant was started by engineers at LinkedIn. Many other > individuals and organizations have contributed to the project, and > this diversity is reflected in the list of initial committers. > > = ALIGNMENT = > > Apache is the most natural home for Dr. Elephant because of its close > relationship to Apache Hadoop and Apache Spark, and its integration > with Apache Oozie and Apache Airflow (incubating). > > = KNOWN RISKS = > > == Orphaned products == > > The risk of the Dr. Elephant project being abandoned is minimal. As > noted earlier, there are many organizations that have benefitted from > Dr. Elephant, and which are thus incentivized to continue > development. In addition, the software vendor PepperData has > integrated Dr. Elephant into their Application Profiler product. > > == Inexperience with Open Source == > > Dr. Elephant has existed as a healthy open source project since > 2016. Any risks that we foresee are ones associated with scaling our > open source communication and operation process rather than with > inherent inexperience in operating as an open source project. > > == Homogenous Developers == > > Apart from Linkedin’s developers, Dr. Elephant has developers from > Airbnb, Pepperdata, Flipkart, Hulu, Foursquare, Altiscale, PayPal, > Evariant, Didi, Trivago, and Cardlytics. > > A lot of effort has been put for efficient communication between all > the developers. We have set up different forums for communication like > github issues, google groups mailing list, gitter chat, weekly > hangouts, and frequent meetups. > > == Reliance on Salaried Developers == > > It is expected that Dr. Elephant development will occur on both > salaried time and on volunteer time, after hours. Many of the initial > committers are paid by their employer to contribute to this > project. However, they are all passionate about the project, and we > are confident that the project will continue even if no salaried > developers contribute to the project. We are committed to recruiting > additional committers including non-salaried developers. > > == A Excessive Fascination with the Apache Brand == > > While we respect the reputation of the Apache brand and have no doubts > that it will attract contributors and users, we believe the ASF is the > right home for Dr. Elephant to foster a great community that will lead > to a better outcome in the long term. > > = Documentation = > > Dr Elephant's developer wiki: https://github.com/linkedin/dr-elephant/wiki > > = Initial Source = > > Dr Elephant's initial source contribution will come from > https://github.com/linkedin/dr-elephant > > The code is licensed under the Apache License V2. > > = Source and Intellectual Property Submission Plan = > > The Dr. Elephant codebase is currently hosted on Github. This is the > exact codebase that we would migrate to the Apache Software > Foundation. The Dr. Elephant source code is already licensed under > Apache License Version 2.0. Going forward, we will continue to have > all the contributions licensed directly to the Apache Software > Foundation through our signed Individual Contributor License > Agreements for all of the committers on the project. > > = External Dependencies = > > To the best of our knowledge all of Dr. Elephant’s dependencies are > distributed under Apache Software Foundation compatible licenses. Upon > acceptance to the incubator, we will begin a thorough analysis of all > transitive dependencies to verify this fact and introduce license > checking into the build and release process. > > = Cryptography = > > We do not expect Dr. Elephant to be a controlled export item due to > the use of encryption. > > = Required Resources = > > == Mailing lists == > > * priv...@drelephant.incubator.apache.org (moderated subscriptions) > * comm...@drelephant.incubator.apache.org > * d...@drelephant.incubator.apache.org > * iss...@drelephant.incubator.apache.org > * u...@drelephant.incubator.apache.org > > == Git Repository == > > Git is the preferred source control system: > git://git.apache.org/dr-elephant > > == Issue Tracking == > > JIRA project DOCTOR > > == Other Resources == > > The existing code already has unit and integration tests, so we would > like a Jenkins instance to run them whenever a new patch is > submitted. This can be added after project creation. > > = Initial Committers = > > * Akshay Rai <akshayrai09 at gmail dot com> > * Anant Nag <nntnag17 at gmail dot com> > * Chetna Chaudhari <chetnachaudhari at gmail dot com> > * Clemens Valiente <clemens dot valiente at gmail dot com> > * Fangshi Li <shengzhixia at gmail dot com> > * George Wu <georgieewuu at gmail dot com> > * Krishna Puttaswamy <krishnaprasad dot pn at gmail dot com> > * Maxime Kestemont <maxkestemont at hotmail dot com> > * Noam Shaish <noamshaish at gmail dot com> > * Paul Reed Bramsen <prb at paulbramsen dot com> > * Ragesh K R <ragesh dot rajagopalan at gmail dot com> > * Shankar Manian <shankar37 at gmail dot com> > * Shahrukh Khan <shahrukhkhan489 at gmail dot com> > * Shekhar Gupta <shkhrgptat gmail dot com> > * Shida Li <lishid at gmail dot com> > > == Affiliations == > > * Akshay Rai - Linkedin > * Anant Nag - Linkedin > * Chetna Chaudhari - SkyTv New Zealand > * Clemens Valiente - trivago GmbH > * Fangshi Li - Linkedin > * George Wu - Pinterest > * Krishna Puttaswamy - Airbnb > * Mark Wagner - Linkedin > * Maxime Kestemont - Criteo > * Noam Shaish - Nordea Bank > * Ragesh K R - Linkedin > * Shankar Manian - Linkedin > * Shahrukh Khan - Hortonworks > * Shekhar Gupta - Pepperdata > * Shida Li - Dynalist Inc. > > = Sponsors = > == Champion == > * Carl Steinbach > > == Nominated Mentors == > * Carl Steinbach (LinkedIn) > > == Sponsoring Entity == > The Apache Incubator >