+1 as well, I think the work Dr. Elephant is doing can also be potentially applied to more than Spark and Hadoop.
Tim On Tue, Mar 6, 2018 at 8:38 PM, Kevin A. McGrail <kmcgr...@apache.org> wrote: > I'm intrigued by the proposal and the product. I'm a 0.5+. > > I'd love to know more about why LI put it on GitHub and what problems it's > having that are leading to a foundation. > > -- > Kevin A. McGrail > Asst. Treasurer & VP Fundraising, Apache Software Foundation > Chair Emeritus Apache SpamAssassin Project > https://www.linkedin.com/in/kmcgrail - 703.798.0171 > > On Tue, Mar 6, 2018 at 8:27 PM, Gangumalla, Uma <uma.ganguma...@intel.com> > wrote: > >> I would +1 to have as a separate project instead of pushing under Hadoop. >> When a project can sustain by having potential to build community on its >> own and can run logically as independent module, I feel that’s good enough >> to start as separate project. >> >> I could not recall the discussions on removal of Vaidya package from >> Hadoop. If someone remembers, it would be great to know the reasons for >> removal of that package from Hadoop base. [ probably at the time of >> mavenization ? ] >> >> Regards, >> Uma >> >> On 3/6/18, 3:17 PM, "md...@cloudera.com on behalf of Mike Drob" < >> md...@cloudera.com on behalf of md...@apache.org> wrote: >> >> Why does Dr. Elephant make sense as a separate project instead of >> contributing to Hadoop directly? >> >> What is the relationship between Dr. Elephant and the (now seemingly >> defunct) Hadoop Vaidya? >> >> On Tue, Mar 6, 2018 at 5:08 PM, Carl Steinbach <c...@apache.org> wrote: >> >> > Hi, >> > >> > I would like to propose Dr. Elephant as an Apache Incubator >> > project. The proposal is available as a draft at >> > https://wiki.apache.org/incubator/DrElephantProposal. I have also >> > included the text of the proposal below. >> > >> > Any feedback from the community is much appreciated. >> > >> > Thanks. >> > >> > - Carl >> > >> > >> > = ABSTRACT = >> > >> > Dr. Elephant is a performance monitoring and tuning service for >> Apache >> > Hadoop and Apache Spark jobs and workflows. While the system is >> > primarily aimed at developers, we have discovered that it is also >> > popular with cluster operators who use it to monitor the health of >> > workloads running on their clusters. >> > >> > = PROPOSAL = >> > >> > Dr. Elephant was open sourced by LinkedIn in 2016 and is currently >> > hosted on GitHub. We believe that being a part of the Apache Software >> > Foundation will improve the diversity and help form a strong >> community >> > around the project. >> > >> > LinkedIn submits this proposal to donate the code base to the Apache >> > Software Foundation. The code is already under Apache License 2.0. >> > Both the source code and documentation are hosted on Github. >> > >> > * Code: http://github.com/linkedin/dr-elephant >> > * Documentation: https://github.com/linkedin/dr-elephant/wiki >> > >> > = Background = >> > >> > Dr. Elephant is a service that helps users of Apache Hadoop and >> Apache >> > Spark understand, analyze, and improve the performance of jobs and >> > workflows running on their clusters. It automatically gathers >> metrics, >> > performs analysis, and presents the results along with actionable >> > advice. The goal of the project is to improve developer productivity >> > and increase cluster efficiency by reducing the time and domain >> > expertise required to diagnose and treat sick jobs. It analyzes >> Hadoop >> > and Spark jobs using a set of configurable, extensible, rule-based >> > heuristics that provide insights on job performance, and then uses >> > this information to provide recommendations about how to tune jobs to >> > make them run more efficiently. >> > >> > Dr. Elephant was open sourced in 2016 after two years of >> > successful production use at Linkedin. In the time since many new >> > features have been added including support for the Oozie and Airflow >> > workflow schedulers, improved metrics, and enhancements to the Spark >> > history fetcher and Spark heuristics. It is also important to note >> > that many of these contributions came from developers outside of >> > LinkedIn. We have also been happy to see that many people have been >> > able to benefit from running Dr. Elephant including companies like >> > Airbnb, Foursquare, Hulu, and Pinterest. >> > >> > = RATIONALE = >> > >> > Dr. Elephant's entry to the ASF will be beneficial to both the >> > Dr. Elephant and Apache communities. Dr. Elephant has greatly >> > benefited from its open source roots. Its community and adoption has >> > grown greatly as a result. More importantly, the feedback from the >> > community, whether through interactions at meetups or through the >> > mailing list, have allowed for a rich exchange of ideas. We believe a >> > partnership with the Apache Foundation is the logical next step. The >> > Dr. Elephant community will greatly benefit from the established >> > development and consensus processes that have worked well for other >> > projects. The Apache process has served many other open source >> > projects well and we believe that the Dr. Elephant community will >> > greatly benefit from these practices as well. >> > >> > = CURRENT STATUS = >> > >> > Dr. Elephant is currently open sourced under the Apache License >> > Version 2.0 and is available at github.com/linkedin/dr-elephant. All >> > of the development is done using GitHub Pull Requests. >> > >> > We are aware of at least 10 organizations that are running >> > Dr. Elephant, and many of these organizations have also contributed >> > code. Dr. Elephant has also been integrated into commercial products >> > such as Pepperdata's Application Profiler. >> > >> > = INITIAL GOALS = >> > >> > Our initial goals are as follows: >> > >> > * Migrate the existing codebase to Apache >> > * Study and integrate with the Apache development process >> > * Ensure all dependencies are compliant with Apache License version >> 2.0 >> > * Incremental development and releases per Apache guidelines >> > * Diversify the set of core developers and committers >> > >> > = MERITOCRACY = >> > >> > Following the Apache meritocracy model, we intend to build an open >> and >> > diverse community around Dr. Elephant. We will encourage the >> community to >> > contribute to discussions and the codebase. >> > >> > = COMMUNITY = >> > >> > The need for a simple and understandable performance monitoring and >> > tuning service for Hadoop and Spark is tremendous. Dr. Elephant is >> > currently being used by at least 10 organizations worldwide (some >> > examples are listed here). We hope to extend the contributor base >> > significantly by bringing Dr. Elephant into Apache. >> > >> > = CORE DEVELOPERS = >> > >> > Dr. Elephant was started by engineers at LinkedIn. Many other >> > individuals and organizations have contributed to the project, and >> > this diversity is reflected in the list of initial committers. >> > >> > = ALIGNMENT = >> > >> > Apache is the most natural home for Dr. Elephant because of its close >> > relationship to Apache Hadoop and Apache Spark, and its integration >> > with Apache Oozie and Apache Airflow (incubating). >> > >> > = KNOWN RISKS = >> > >> > == Orphaned products == >> > >> > The risk of the Dr. Elephant project being abandoned is minimal. As >> > noted earlier, there are many organizations that have benefitted from >> > Dr. Elephant, and which are thus incentivized to continue >> > development. In addition, the software vendor PepperData has >> > integrated Dr. Elephant into their Application Profiler product. >> > >> > == Inexperience with Open Source == >> > >> > Dr. Elephant has existed as a healthy open source project since >> > 2016. Any risks that we foresee are ones associated with scaling our >> > open source communication and operation process rather than with >> > inherent inexperience in operating as an open source project. >> > >> > == Homogenous Developers == >> > >> > Apart from Linkedin’s developers, Dr. Elephant has developers from >> > Airbnb, Pepperdata, Flipkart, Hulu, Foursquare, Altiscale, PayPal, >> > Evariant, Didi, Trivago, and Cardlytics. >> > >> > A lot of effort has been put for efficient communication between all >> > the developers. We have set up different forums for communication >> like >> > github issues, google groups mailing list, gitter chat, weekly >> > hangouts, and frequent meetups. >> > >> > == Reliance on Salaried Developers == >> > >> > It is expected that Dr. Elephant development will occur on both >> > salaried time and on volunteer time, after hours. Many of the initial >> > committers are paid by their employer to contribute to this >> > project. However, they are all passionate about the project, and we >> > are confident that the project will continue even if no salaried >> > developers contribute to the project. We are committed to recruiting >> > additional committers including non-salaried developers. >> > >> > == A Excessive Fascination with the Apache Brand == >> > >> > While we respect the reputation of the Apache brand and have no >> doubts >> > that it will attract contributors and users, we believe the ASF is >> the >> > right home for Dr. Elephant to foster a great community that will >> lead >> > to a better outcome in the long term. >> > >> > = Documentation = >> > >> > Dr Elephant's developer wiki: https://github.com/linkedin/ >> dr-elephant/wiki >> > >> > = Initial Source = >> > >> > Dr Elephant's initial source contribution will come from >> > https://github.com/linkedin/dr-elephant >> > >> > The code is licensed under the Apache License V2. >> > >> > = Source and Intellectual Property Submission Plan = >> > >> > The Dr. Elephant codebase is currently hosted on Github. This is the >> > exact codebase that we would migrate to the Apache Software >> > Foundation. The Dr. Elephant source code is already licensed under >> > Apache License Version 2.0. Going forward, we will continue to have >> > all the contributions licensed directly to the Apache Software >> > Foundation through our signed Individual Contributor License >> > Agreements for all of the committers on the project. >> > >> > = External Dependencies = >> > >> > To the best of our knowledge all of Dr. Elephant’s dependencies are >> > distributed under Apache Software Foundation compatible licenses. >> Upon >> > acceptance to the incubator, we will begin a thorough analysis of all >> > transitive dependencies to verify this fact and introduce license >> > checking into the build and release process. >> > >> > = Cryptography = >> > >> > We do not expect Dr. Elephant to be a controlled export item due to >> > the use of encryption. >> > >> > = Required Resources = >> > >> > == Mailing lists == >> > >> > * priv...@drelephant.incubator.apache.org (moderated subscriptions) >> > * comm...@drelephant.incubator.apache.org >> > * d...@drelephant.incubator.apache.org >> > * iss...@drelephant.incubator.apache.org >> > * u...@drelephant.incubator.apache.org >> > >> > == Git Repository == >> > >> > Git is the preferred source control system: >> > git://git.apache.org/dr-elephant >> > >> > == Issue Tracking == >> > >> > JIRA project DOCTOR >> > >> > == Other Resources == >> > >> > The existing code already has unit and integration tests, so we would >> > like a Jenkins instance to run them whenever a new patch is >> > submitted. This can be added after project creation. >> > >> > = Initial Committers = >> > >> > * Akshay Rai <akshayrai09 at gmail dot com> >> > * Anant Nag <nntnag17 at gmail dot com> >> > * Chetna Chaudhari <chetnachaudhari at gmail dot com> >> > * Clemens Valiente <clemens dot valiente at gmail dot com> >> > * Fangshi Li <shengzhixia at gmail dot com> >> > * George Wu <georgieewuu at gmail dot com> >> > * Krishna Puttaswamy <krishnaprasad dot pn at gmail dot com> >> > * Maxime Kestemont <maxkestemont at hotmail dot com> >> > * Noam Shaish <noamshaish at gmail dot com> >> > * Paul Reed Bramsen <prb at paulbramsen dot com> >> > * Ragesh K R <ragesh dot rajagopalan at gmail dot com> >> > * Shankar Manian <shankar37 at gmail dot com> >> > * Shahrukh Khan <shahrukhkhan489 at gmail dot com> >> > * Shekhar Gupta <shkhrgptat gmail dot com> >> > * Shida Li <lishid at gmail dot com> >> > >> > == Affiliations == >> > >> > * Akshay Rai - Linkedin >> > * Anant Nag - Linkedin >> > * Chetna Chaudhari - SkyTv New Zealand >> > * Clemens Valiente - trivago GmbH >> > * Fangshi Li - Linkedin >> > * George Wu - Pinterest >> > * Krishna Puttaswamy - Airbnb >> > * Mark Wagner - Linkedin >> > * Maxime Kestemont - Criteo >> > * Noam Shaish - Nordea Bank >> > * Ragesh K R - Linkedin >> > * Shankar Manian - Linkedin >> > * Shahrukh Khan - Hortonworks >> > * Shekhar Gupta - Pepperdata >> > * Shida Li - Dynalist Inc. >> > >> > = Sponsors = >> > == Champion == >> > * Carl Steinbach >> > >> > == Nominated Mentors == >> > * Carl Steinbach (LinkedIn) >> > >> > == Sponsoring Entity == >> > The Apache Incubator >> > >> >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org