Thanks for the reference Alex. It answers my question regarding the path you chose.
-Taylor > On Nov 13, 2015, at 12:13 AM, Alexander Bezzubov <abezzu...@nflabs.com> wrote: > > Hi, > > it looks pretty interesting, especially a part about integration with > Zeppelin as another Scala interpreter implementation. > > AFAIK there was a discussion on including Spark-Kernel to spark core > https://issues.apache.org/jira/browse/SPARK-4605 but not sure about a > possibility of becoming a sub-project one. > > Would be interesting to know as indeed it looks very aligned with Apache > Spark. > > -- > Alex > >> On Fri, Nov 13, 2015 at 10:05 AM, P. Taylor Goetz <ptgo...@gmail.com> wrote: >> >> Just a quick (or maybe not :) ) question... >> >> Given the tight coupling to the Apache Spark project, were there any >> considerations or discussions with the Spark community regarding including >> the Spark-Kernel functionality outright in Spark, or the possibility of >> becoming a subproject? >> >> I'm just curious. I don't think an answer one way or another would >> necessarily block incubation. >> >> -Taylor >> >>> On Nov 12, 2015, at 7:17 PM, da...@fallside.com wrote: >>> >>> Hello, we would like to start a discussion on accepting the Spark-Kernel, >>> a mechanism for applications to interactively and remotely access Apache >>> Spark, into the Apache Incubator. >>> >>> The proposal is available online at >>> https://wiki.apache.org/incubator/SparkKernelProposal, and it is >> appended >>> to this email. >>> >>> We are looking for additional mentors to help with this project, and we >>> would much appreciate your guidance and advice. >>> >>> Thank-you in advance, >>> David Fallside >>> >>> >>> >>> = Spark-Kernel Proposal = >>> >>> == Abstract == >>> Spark-Kernel provides applications with a mechanism to interactively and >>> remotely access Apache Spark. >>> >>> == Proposal == >>> The Spark-Kernel enables interactive applications to access Apache Spark >>> clusters. More specifically: >>> * Applications can send code-snippets and libraries for execution by >> Spark >>> * Applications can be deployed separately from Spark clusters and >>> communicate with the Spark-Kernel using the provided Spark-Kernel client >>> * Execution results and streaming data can be sent back to calling >>> applications >>> * Applications no longer have to be network connected to the workers on a >>> Spark cluster because the Spark-Kernel acts as each application’s proxy >>> * Work has started on enabling Spark-Kernel to support languages in >>> addition to Scala, namely Python (with PySpark), R (with SparkR), and SQL >>> (with SparkSQL) >>> >>> == Background & Rationale == >>> Apache Spark provides applications with a fast and general purpose >>> distributed computing engine that supports static and streaming data, >>> tabular and graph representations of data, and an extensive library of >>> machine learning libraries. Consequently, a wide variety of applications >>> will be written for Spark and there will be interactive applications that >>> require relatively frequent function evaluations, and batch-oriented >>> applications that require one-shot or only occasional evaluation. >>> >>> Apache Spark provides two mechanisms for applications to connect with >>> Spark. The primary mechanism launches applications on Spark clusters >> using >>> spark-submit >>> (http://spark.apache.org/docs/latest/submitting-applications.html); this >>> requires developers to bundle their application code plus any >> dependencies >>> into JAR files, and then submit them to Spark. A second mechanism is an >>> ODBC/JDBC API >>> ( >> http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine >> ) >>> which enables applications to issue SQL queries against SparkSQL. >>> >>> Our experience when developing interactive applications, such as analytic >>> applications and Jupyter Notebooks, to run against Spark was that the >>> spark-submit mechanism was overly cumbersome and slow (requiring JAR >>> creation and forking processes to run spark-submit), and the SQL >> interface >>> was too limiting and did not offer easy access to components other than >>> SparkSQL, such as streaming. The most promising mechanism provided by >>> Apache Spark was the command-line shell >>> ( >> http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell >> ) >>> which enabled us to execute code snippets and dynamically control the >>> tasks submitted to a Spark cluster. Spark does not provide the >>> command-line shell as a consumable service but it provided us with the >>> starting point from which we developed the Spark-Kernel. >>> >>> == Current Status == >>> Spark-Kernel was first developed by a small team working on an >>> internal-IBM Spark-related project in July 2014. In recognition of its >>> likely general utility to Spark users and developers, in November 2014 >> the >>> Spark-Kernel project was moved to GitHub and made available under the >>> Apache License V2. >>> >>> == Meritocracy == >>> The current developers are familiar with the meritocratic open source >>> development process at Apache. As the project has gathered interest at >>> GitHub the developers have actively started a process to invite >> additional >>> developers into the project, and we have at least one new developer who >> is >>> ready to contribute code to the project. >>> >>> == Community == >>> We started building a community around the Spark-Kernel project when we >>> moved it to GitHub about one year ago. Since then we have grown to about >>> 70 people, and there are regular requests and suggestions from the >>> community. We believe that providing Apache Spark application developers >>> with a general-purpose and interactive API holds a lot of community >>> potential, especially considering possible tie-in’s with the Jupyter and >>> data science community. >>> >>> == Core Developers == >>> The core developers of the project are currently all from IBM, from the >>> IBM Emerging Technology team and from IBM’s recently formed Spark >>> Technology Center. >>> >>> == Alignment == >>> Apache, as the home of Apache Spark, is the most natural home for the >>> Spark-Kernel project because it was designed to work with Apache Spark >> and >>> to provide capabilities for interactive applications and data science >>> tools not provided by Spark itself. >>> >>> The Spark-Kernel also has an affinity with Jupyter (jupyter.org) because >>> it uses the Jupyter protocol for communications, and so Jupyter Notebooks >>> can directly use the Spark-Kernel as a kernel for communicating with >>> Apache Spark. However, we believe that the Spark-Kernel provides a >>> general-purpose mechanism enabling a wider variety of applications than >>> just Notebooks to access Spark, and so the Spark-Kernel’s greatest >>> affinity is with Apache and Apache Spark. >>> >>> == Known Risks == >>> === Orphaned products === >>> We believe the Spark-Kernel project has a low-risk of abandonment due to >>> interest in its continuing existence from several parties. More >>> specifically, the Spark-Kernel provides a capability that is not provided >>> by Apache Spark today but it enables a wider range of applications to >>> leverage Spark. For example, IBM uses (and is considering) the >>> Spark-Kernel in several offerings including its IBM Analytics for Apache >>> Spark product in the Bluemix Cloud. There are also a couple of other >>> commercial users who are using or considering its use in their offerings. >>> Furthermore, Jupyter Notebooks are used by data scientists and Spark is >>> gaining popularity as an analytic engine for them. Jupyter Notebooks are >>> very easily enabled with the Spark-Kernel and so there is another >>> constituency for it. >>> >>> === Inexperience with Open Source === >>> The Spark-Kernel project has been running as an open-source project >>> (albeit with only IBM committers) for the past several months. The >> project >>> has an active issue tracker and due to the interest indicated by the >>> nature and volume of requests and comments, the team has publicly stated >>> it is beginning to build a process so they can accept third-party >>> contributions to the project. >>> >>> === Relationships with Other Apache Products === >>> The Spark-Kernel has a clear affinity with the Apache Spark project >>> because it is designed to provide capabilities for interactive >>> applications and data science tools not provided by Spark itself. The >>> Spark-Kernel can be a back-end for the Zeppelin project currently >>> incubating at Apache. There is interest from the Spark-Kernel community >> to >>> develop this capability and an experimental branch has been started. >>> >>> === Homogeneous Developers === >>> The current group of developers working on Spark-Kernel are all from IBM >>> although the group is in the process of expanding its membership to >>> include members of the GitHub community who are not from IBM and who have >>> been active in the Spark-Kernel community in GutHub. >>> >>> === Reliance on Salaried Developers === >>> The initial committers are full-time employees at IBM although not all >>> work on the project full-time. >>> >>> === Excessive Fascination with the Apache Brand === >>> We believe the Spark-Kernel benefits Apache Spark application developers, >>> and we are interested in an Apache Spark-Kernel project to benefit these >>> developers by engaging a larger community, facilitating closer ties with >>> the existing Spark project, and yes, gaining more visibility for the >>> Spark-Kernel as a solution. >>> >>> We have recently become aware that the project name “Spark-Kernel” may be >>> interpreted as having an association with an Apache project. If the >>> project is accepted by Apache, we suggest the project name remains the >>> same, but otherwise we will change it to one that does not imply any >>> Apache association. >>> >>> === Documentation === >>> Comprehensive documentation including “Getting Started”, API >>> specifications and a Roadmap are available from the GitHub project, see >>> https://github.com/ibm-et/spark-kernel/wiki. >>> >>> === Initial Source === >>> The source code resides at https://github.com/ibm-et/spark-kernel. >>> >>> === External Dependencies === >>> The Spark-Kernel depends upon a number of Apache projects: >>> * Spark >>> * Hadoop >>> * Ivy >>> * Commons >>> >>> The Spark-Kernel also depends upon a number of other open source >> projects: >>> * JeroMQ (LGPL with Static Linking Exception, >>> http://zeromq.org/area:licensing) >>> * Akka (MIT) >>> * JOpt Simple (MIT) >>> * Spring Framework Core (Apache v2) >>> * Play (Apache v2) >>> * SLF4J (MIT) >>> * Scala >>> * Scalatest (Apache v2) >>> * Scalactic (Apache v2) >>> * Mockito (MIT) >>> >>> == Required Resources == >>> Developer and user mailing lists >>> * priv...@spark-kernel.incubator.apache.org (with moderated >> subscriptions) >>> * comm...@spark-kernel.incubator.apache.org >>> * d...@spark-kernel.incubator.apache.org >>> * us...@spark-kernel.incubator.apache.org >>> >>> A git repository: >>> https://git-wip-us.apache.org/repos/asf/incubator-spark-kernel.git >>> >>> A JIRA issue tracker: https://issues.apache.org/jira/browse/SPARK-KERNEL >>> >>> == Initial Committers == >>> The initial list of committers is: >>> * Leugim Bustelo (g...@bustelos.com) >>> * Jakob Odersky (joder...@gmail.com) >>> * Luciano Resende (lrese...@apache.org) >>> * Robert Senkbeil (chip.senkb...@gmail.com) >>> * Corey Stubbs (cas5...@gmail.com) >>> * Miao Wang (wm...@hotmail.com) >>> * Sean Welleck (welle...@gmail.com) >>> >>> === Affiliations === >>> All of the initial committers are employed by IBM. >>> >>> == Sponsors == >>> === Champion === >>> * Sam Ruby (IBM) >>> >>> === Nominated Mentors === >>> * Luciano Resende >>> >>> We wish to recruit additional mentors during incubation. >>> >>> === Sponsoring Entity === >>> The Apache Incubator. >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >>> For additional commands, e-mail: general-h...@incubator.apache.org >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org > > > -- > -- > Kind regards, > Alexander. --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org