I think this will be hard to maintain; we already have JIRA as the de facto central place to store discussions and prioritize work, and the 2.x stuff is already a JIRA. The wiki doesn't really hurt, just probably will never be looked at again. Let's point people in all cases to JIRA.
On Tue, Dec 22, 2015 at 11:52 PM, Reynold Xin <r...@databricks.com> wrote: > I started a wiki page: > https://cwiki.apache.org/confluence/display/SPARK/Development+Discussions > > > On Tue, Dec 22, 2015 at 6:27 AM, Tom Graves <tgraves...@yahoo.com> wrote: >> >> Do we have a summary of all the discussions and what is planned for 2.0 >> then? Perhaps we should put on the wiki for reference. >> >> Tom >> >> >> On Tuesday, December 22, 2015 12:12 AM, Reynold Xin <r...@databricks.com> >> wrote: >> >> >> FYI I updated the master branch's Spark version to 2.0.0-SNAPSHOT. >> >> On Tue, Nov 10, 2015 at 3:10 PM, Reynold Xin <r...@databricks.com> wrote: >> >> I’m starting a new thread since the other one got intermixed with feature >> requests. Please refrain from making feature request in this thread. Not >> that we shouldn’t be adding features, but we can always add features in 1.7, >> 2.1, 2.2, ... >> >> First - I want to propose a premise for how to think about Spark 2.0 and >> major releases in Spark, based on discussion with several members of the >> community: a major release should be low overhead and minimally disruptive >> to the Spark community. A major release should not be very different from a >> minor release and should not be gated based on new features. The main >> purpose of a major release is an opportunity to fix things that are broken >> in the current API and remove certain deprecated APIs (examples follow). >> >> For this reason, I would *not* propose doing major releases to break >> substantial API's or perform large re-architecting that prevent users from >> upgrading. Spark has always had a culture of evolving architecture >> incrementally and making changes - and I don't think we want to change this >> model. In fact, we’ve released many architectural changes on the 1.X line. >> >> If the community likes the above model, then to me it seems reasonable to >> do Spark 2.0 either after Spark 1.6 (in lieu of Spark 1.7) or immediately >> after Spark 1.7. It will be 18 or 21 months since Spark 1.0. A cadence of >> major releases every 2 years seems doable within the above model. >> >> Under this model, here is a list of example things I would propose doing >> in Spark 2.0, separated into APIs and Operation/Deployment: >> >> >> APIs >> >> 1. Remove interfaces, configs, and modules (e.g. Bagel) deprecated in >> Spark 1.x. >> >> 2. Remove Akka from Spark’s API dependency (in streaming), so user >> applications can use Akka (SPARK-5293). We have gotten a lot of complaints >> about user applications being unable to use Akka due to Spark’s dependency >> on Akka. >> >> 3. Remove Guava from Spark’s public API (JavaRDD Optional). >> >> 4. Better class package structure for low level developer API’s. In >> particular, we have some DeveloperApi (mostly various listener-related >> classes) added over the years. Some packages include only one or two public >> classes but a lot of private classes. A better structure is to have public >> classes isolated to a few public packages, and these public packages should >> have minimal private classes for low level developer APIs. >> >> 5. Consolidate task metric and accumulator API. Although having some >> subtle differences, these two are very similar but have completely different >> code path. >> >> 6. Possibly making Catalyst, Dataset, and DataFrame more general by moving >> them to other package(s). They are already used beyond SQL, e.g. in ML >> pipelines, and will be used by streaming also. >> >> >> Operation/Deployment >> >> 1. Scala 2.11 as the default build. We should still support Scala 2.10, >> but it has been end-of-life. >> >> 2. Remove Hadoop 1 support. >> >> 3. Assembly-free distribution of Spark: don’t require building an enormous >> assembly jar in order to run Spark. >> >> >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org