Looks like the feedback has been well received. Any reason not to start a vote?
Thanks, Roman. On Mon, Aug 4, 2014 at 11:12 PM, Byung-Gon Chun <[email protected]> wrote: > Hi Jake, > > Thank you for the comment. > > We had discussions on how to structure mailing lists with our mentors. > We took our mentors' suggestions to start with a minimal set (two mailing > lists) not to miss important discussions and to split them if there are > demands. > > Thanks! > -Gon > > --- > Byung-Gon Chun > > > > > > > On Tue, Aug 5, 2014 at 3:04 AM, Jake Farrell <[email protected]> wrote: > >> Would suggest you use the following format for the mailing lists (you have >> the older format listed) and also split the dev and commits. Also a lot of >> new projects have been also splitting out the jira issues from dev to cut >> down on noise on the dev list, would add issues@reef if you want to do >> this. >> >> private@reef for private PMC discussions >> dev@reef for technical discussions >> commits@reef notification about commits >> issues@reef jira notifications >> >> -Jake >> >> >> >> On Fri, Aug 1, 2014 at 3:14 AM, Byung-Gon Chun <[email protected]> wrote: >> >> > Hi everyone, >> > >> > I would like to propose REEF to be an Apache Incubator project. REEF is a >> > scale-out computing fabric that eases the development of Big Data >> > applications on top of resource managers such as Apache YARN and Mesos. >> > >> > The proposal is included in plain text below. I would also like to put >> this >> > on wiki but I don't have privileges to create wiki pages. >> > >> > I look forward to hearing everyone's thoughts and feedback! >> > >> > -Gon >> > >> > -- >> > Byung-Gon Chun >> > >> > >> > === >> > >> > # REEFProposal - Incubator >> > >> > >> > # Abstract >> > >> > REEF (Retainable Evaluator Execution Framework) is a scale-out >> > computing fabric that eases the development of Big Data applications >> > on top of resource managers such as Apache YARN and Mesos. >> > >> > >> > # Proposal >> > >> > REEF is a Big Data system that makes it easy to implement scalable, >> > fault-tolerant runtime environments for a range of data processing >> > models (e.g., graph processing and machine learning) on top of >> > resource managers such as Apache YARN and Mesos. REEF provides >> > capabilities to run multiple heterogeneous frameworks and workflows of >> > those efficiently. >> > >> > Additionally, REEF contains two libraries that are of independent >> > value: Wake is an event-based-programming framework inspired by Rx and >> > SEDA. Tang is a dependency injection framework inspired by Google >> > Guice, but designed specifically for configuring distributed systems. >> > >> > >> > # Background >> > >> > The resource management layer such as Apache YARN and Mesos has >> > emerged as a critical layer in the new scale-out data processing >> > stack; resource managers assume the responsibility of multiplexing a >> > cluster of shared-nothing machines across heterogeneous >> > applications. They operate behind an interface for leasing containers >> > - a slice of a machine’s resources - to computations in an elastic >> > fashion. However, building data processing frameworks directly on this >> > layer comes at a high cost: each framework must tackle the same >> > challenges (e.g., fault-tolerance, task scheduling and coordination) >> > and reimplement common mechanisms (e.g., caching, bulk transfers). >> > >> > REEF provides a reusable control-plane for scheduling and coordinating >> > task-level work on cluster resource managers. The REEF design enables >> > sophisticated optimizations, such as container re-use and data >> > caching, and facilitates workflows that span multiple >> > frameworks. Examples include pipelining data between different >> > operators in a relational system, retaining state across iterations in >> > iterative or recursive data flow, and passing the result of a >> > MapReduce job to a Machine Learning computation. >> > >> > >> > # Rationale >> > >> > Since REEF is a library that makes it easy to write distributed >> > applications on top of Apache YARN or Mesos, the Apache Software >> Foundation >> > is the perfect home for hosting REEF. >> > >> > >> > # Current Status >> > >> > REEF has been developed mostly by Microsoft, UCLA and the Seoul >> > National University. The REEF codebase is open-sourced under Apache >> > License 2.0 and is currently hosted in a public repository at >> > github.com. >> > >> > >> > # Meritocracy >> > >> > We plan to build a strong open community by following the Apache >> > meritocracy principles. We will work with those who contribute >> > significantly to the project and invite them to be its committers. >> > >> > >> > # Community >> > >> > REEF is currently being used internally at Microsoft. Also, SK >> > Telecom builds their data analytics infrastructure on top of REEF in >> > collaboration with Seoul National University. We hope to extend our >> > contributor base by becoming an Apache incubator project. REEF will >> > attract developers who are interested in creating common building >> > blocks for simplifying the development of large-scale big data >> > applications. >> > >> > >> > # Core Developers >> > >> > Core developers are engineers from Microsoft, Purestorage, UCB, UCLA, >> > UW and Seoul National University. >> > >> > >> > # Alignment >> > >> > REEF depends on many Apache projects and dependencies. REEF is built >> > on resource managers such as Apache YARN and Apache Mesos. REEF also >> > uses HDFS as a distributed storage layer. >> > >> > >> > # Known Risks >> > ## Orphaned Products >> > >> > The risk of REEF being orphaned is small because Microsoft products >> > are built on REEF. The core REEF developers continue to work on REEF >> > at Microsoft, UCLA, and Seoul National University. The REEF project is >> > gaining interest from other institutions to be used as their >> > infrastructure. >> > >> > ## Inexperience with Open Source >> > >> > Several core developers have experience with open source development. >> > REEF committers will be guided by the mentors with strong Apache open >> > source project backgrounds. >> > >> > ## Homogeneous Developers >> > >> > The initial committers include developers from several institutions >> > including Microsoft, Purestorage, UCB, UCLA, and Seoul National >> > University. >> > >> > ## Reliance on Salaried Developers >> > >> > Developers from Microsoft are paid to work on REEF. Since the work is >> > used internally at Microsoft, Microsoft will keep supporting the >> > developers to work on REEF. There are also engineers and graduate >> > students that contribute to REEF from UCLA, UCB, UW and Seoul National >> > University. We plan to attract active developers from other >> > institutions. >> > >> > ## Relationships with Other Apache Products >> > >> > Given REEF's position in the big data stack, there are three >> > relationships to consider: Projects that fit below, on top of, or >> > alongside REEF in the stack. >> > >> > ### Below REEF: Mesos and YARN >> > >> > REEF is designed to facilitate application development on top of >> > resource managers. Hence, its relationship with the aforementioned >> > resource managers is symbiotic by design. >> > >> > ### On Top of REEF >> > >> > Apache Spark, Giraph, MapReduce and Flink are only some of the >> > projects that logically belong at a higher layer of the big data stack >> > than REEF. Of course, none of these today actually are leveraging >> > REEF and had to each individually solve some of the issues REEF >> > addresses. It is our goal that REEF will help developers create >> > an even richer set of future big data frameworks. >> > >> > ### Alongside REEF >> > >> > Apache hosts several projects building intermediate, library layers on >> > top of a resource management platform. Twill, Slider, and Tez are >> > notable examples in the incubator. These projects share many >> > objectives with REEF (and each other). We expect these parallel >> > explorations to converge and differentiate within Apache, as the space >> > for distributed applications and deployment is too vast for a single >> > answer. >> > >> > Apache Twill and REEF both aim to simplify application development on >> > top of resource managers. However, REEF and Twill go about this in >> > different ways: Twill simplifies programming by exposing a programming >> > model, Java Threads. REEF on the other hand provides a set of common >> > building blocks (e.g., job coordination, state passing, cluster >> > membership) for building big data processing applications and >> > virtualizes underlying resources managers. None of this prescribes a >> > specific programming model. As such, REEF occupies a slot ever so >> > slightly below Twill in an architecture stack. >> > >> > Apache Slider is a framework to make it easy to deploy and manage >> > long-running static applications in a YARN cluster. The focus is to >> > adapt existing applications such as HBase and Accumulo to run on YARN >> > with little modification. Therefore, the goals of Slider and REEF are >> > different. >> > >> > Apache Tez is a project to develop a generic Directed Acyclic Graph (DAG) >> > processing framework with a reusable set of data processing primitives. >> > The initial focus is to provide improved data processing capabilities for >> > projects like Apache Hive, Apache Pig, and Cascading. Tez is still a >> single >> > framework for DAG processing. In contrast, REEF provides a generic >> > layer on which diverse computation models (DAG, ML, Graph processing, >> > and Interactive query processing) can be built. More importantly, >> > REEF provides a layer that facilitates inter-framework resource and >> > in-memory state use and virtualizes resource managers. Regarding >> > re-usable data processing primitives, Tez and REEF share the same >> > goal. We hope to collaborate on features which can be shared between >> > Tez and REEF. >> > >> > >> > ## An Excessive Fascination with the Apache Brand >> > >> > The Apache Software Foundation has a reputation of being the best place >> to >> > host open source projects. We believe that we will attract many >> developers >> > who want to contribute to innovating in the Big Data platform space by >> > joining the Apache Software Foundation. >> > >> > >> > # Documentation >> > >> > The current documentation for REEF is at >> > https://github.com/Microsoft-CISL/REEF as well as on >> > http://www.reef-project.org >> > >> > >> > # Initial Source >> > >> > The REEF codebase is currently hosted at >> > https://github.com/Microsoft-CISL/REEF. >> > >> > >> > # External Dependencies >> > >> > REEF makes extensive use of the vast array of Java libraries from the >> > Apache Software Foundation, namely: >> > >> > * avro (Apache 2.0) >> > * hadoop (Apache 2.0) >> > * hdfs (Apache 2.0) >> > * yarn (Apache 2.0) >> > * commons-cli (Apache 2.0) >> > * commons-configuration (Apache 2.0) >> > * commons-lang (Apache 2.0) >> > * commons-logging (Apache 2.0) >> > >> > To the best of our knowledge, the external dependencies of REEF are >> > distributed under Apache compatible licenses: >> > >> > * guava-libraries (Apache 2.0) >> > * protobuf (BSD) >> > * asm (BSD) >> > * netty (Apache 2.0) >> > * mockito (MIT) >> > * junit (EPL 1.0) >> > * slf4j (MIT) >> > >> > >> > # Cryptography >> > >> > REEF will depend on secure Hadoop, which can optionally use Kerberos. >> > >> > # Required Resources >> > >> > ## Mailing Lists >> > >> > * reef-private for private PMC discussions >> > * reef-dev for technical discussions among contributors and >> > notification about commits >> > >> > ## Subversion Directory >> > >> > The REEF team uses Git for source version control: >> > git://git.apache.org/reef >> > >> > ## Issue Tracking >> > >> > JIRA REEF (REEF) >> > >> > ## Other Resources >> > >> > Jenkins continuous integration testing >> > >> > # Initial Committers >> > >> > * Markus Weimer >> > * Sergiy Matusevych >> > * Julia Wang >> > * Shravan M Narayanamurthy >> > * Yingda Chen >> > * Tony Majestro >> > * Beysim Sezgin >> > * Boris Shulman >> > * Russell Sears >> > * Jung Ryong Lee >> > * You Sun Jung >> > * Dong Joon Hyun >> > * Josh Rosen >> > * Tyson Condie >> > * Brandon Myers >> > * Yunseong Lee >> > * Taegeon Um >> > * Youngseok Yang >> > * Brian Cho >> > * Byung-Gon Chun >> > >> > # Affiliations >> > >> > * Microsoft: >> > * Markus Weimer >> > * Sergiy Matusevych >> > * Julia Wang >> > * Shravan M Narayanamurthy >> > * Yingda Chen >> > * Tony Majestro >> > * Beysim Sezgin >> > * Boris Shulman >> > * Purestorage: >> > * Russell Sears >> > * SK Telecom: >> > * Jung Ryong Lee >> > * You Sun Jung >> > * Dong Joon Hyun >> > * University of California: >> > * Josh Rosen (Berkeley) >> > * Tyson Condie (LA) >> > * University of Washington: >> > * Brandon Myers >> > * Seoul National University: >> > * Yunseong Lee >> > * Taegeon Um >> > * Youngseok Yang >> > * Brian Cho >> > * Byung-Gon Chun >> > >> > >> > # Sponsors >> > >> > ## Champions >> > Chris Douglas <[email protected]> >> > >> > ## Nominated Mentors >> > * Chris Mattmann <[email protected]> >> > * Ross Gardler <[email protected]> >> > * Owen O'Malley <[email protected]> >> > >> > ## Sponsoring Entity >> > The Apache Incubator >> > >> > > > > -- > Byung-Gon Chun --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
