+1 2018-06-15 14:55 GMT-07:00 Reynold Xin <r...@databricks.com>:
> Yes. At this rate I think it's better to do 2.4 next, followed by 3.0. > > > On Fri, Jun 15, 2018 at 10:52 AM Mridul Muralidharan <mri...@gmail.com> > wrote: > >> I agree, I dont see pressing need for major version bump as well. >> >> >> Regards, >> Mridul >> On Fri, Jun 15, 2018 at 10:25 AM Mark Hamstra <m...@clearstorydata.com> >> wrote: >> > >> > Changing major version numbers is not about new features or a vague >> notion that it is time to do something that will be seen to be a >> significant release. It is about breaking stable public APIs. >> > >> > I still remain unconvinced that the next version can't be 2.4.0. >> > >> > On Fri, Jun 15, 2018 at 1:34 AM Andy <andyye...@gmail.com> wrote: >> >> >> >> Dear all: >> >> >> >> It have been 2 months since this topic being proposed. Any progress >> now? 2018 has been passed about 1/2. >> >> >> >> I agree with that the new version should be some exciting new feature. >> How about this one: >> >> >> >> 6. ML/DL framework to be integrated as core component and feature. >> (Such as Angel / BigDL / ……) >> >> >> >> 3.0 is a very important version for an good open source project. It >> should be better to drift away the historical burden and focus in new area. >> Spark has been widely used all over the world as a successful big data >> framework. And it can be better than that. >> >> >> >> Andy >> >> >> >> >> >> On Thu, Apr 5, 2018 at 7:20 AM Reynold Xin <r...@databricks.com> >> wrote: >> >>> >> >>> There was a discussion thread on scala-contributors about Apache >> Spark not yet supporting Scala 2.12, and that got me to think perhaps it is >> about time for Spark to work towards the 3.0 release. By the time it comes >> out, it will be more than 2 years since Spark 2.0. >> >>> >> >>> For contributors less familiar with Spark’s history, I want to give >> more context on Spark releases: >> >>> >> >>> 1. Timeline: Spark 1.0 was released May 2014. Spark 2.0 was July >> 2016. If we were to maintain the ~ 2 year cadence, it is time to work on >> Spark 3.0 in 2018. >> >>> >> >>> 2. Spark’s versioning policy promises that Spark does not break >> stable APIs in feature releases (e.g. 2.1, 2.2). API breaking changes are >> sometimes a necessary evil, and can be done in major releases (e.g. 1.6 to >> 2.0, 2.x to 3.0). >> >>> >> >>> 3. That said, a major version isn’t necessarily the playground for >> disruptive API changes to make it painful for users to update. The main >> purpose of a major release is an opportunity to fix things that are broken >> in the current API and remove certain deprecated APIs. >> >>> >> >>> 4. Spark as a project has a culture of evolving architecture and >> developing major new features incrementally, so major releases are not the >> only time for exciting new features. For example, the bulk of the work in >> the move towards the DataFrame API was done in Spark 1.3, and Continuous >> Processing was introduced in Spark 2.3. Both were feature releases rather >> than major releases. >> >>> >> >>> >> >>> You can find more background in the thread discussing Spark 2.0: >> http://apache-spark-developers-list.1001551.n3.nabble.com/A-proposal-for- >> Spark-2-0-td15122.html >> >>> >> >>> >> >>> The primary motivating factor IMO for a major version bump is to >> support Scala 2.12, which requires minor API breaking changes to Spark’s >> APIs. Similar to Spark 2.0, I think there are also opportunities for other >> changes that we know have been biting us for a long time but can’t be >> changed in feature releases (to be clear, I’m actually not sure they are >> all good ideas, but I’m writing them down as candidates for consideration): >> >>> >> >>> 1. Support Scala 2.12. >> >>> >> >>> 2. Remove interfaces, configs, and modules (e.g. Bagel) deprecated in >> Spark 2.x. >> >>> >> >>> 3. Shade all dependencies. >> >>> >> >>> 4. Change the reserved keywords in Spark SQL to be more ANSI-SQL >> compliant, to prevent users from shooting themselves in the foot, e.g. >> “SELECT 2 SECOND” -- is “SECOND” an interval unit or an alias? To make it >> less painful for users to upgrade here, I’d suggest creating a flag for >> backward compatibility mode. >> >>> >> >>> 5. Similar to 4, make our type coercion rule in DataFrame/SQL more >> standard compliant, and have a flag for backward compatibility. >> >>> >> >>> 6. Miscellaneous other small changes documented in JIRA already (e.g. >> “JavaPairRDD flatMapValues requires function returning Iterable, not >> Iterator”, “Prevent column name duplication in temporary view”). >> >>> >> >>> >> >>> Now the reality of a major version bump is that the world often >> thinks in terms of what exciting features are coming. I do think there are >> a number of major changes happening already that can be part of the 3.0 >> release, if they make it in: >> >>> >> >>> 1. Scala 2.12 support (listing it twice) >> >>> 2. Continuous Processing non-experimental >> >>> 3. Kubernetes support non-experimental >> >>> 4. A more flushed out version of data source API v2 (I don’t think it >> is realistic to stabilize that in one release) >> >>> 5. Hadoop 3.0 support >> >>> 6. ... >> >>> >> >>> >> >>> >> >>> Similar to the 2.0 discussion, this thread should focus on the >> framework and whether it’d make sense to create Spark 3.0 as the next >> release, rather than the individual feature requests. Those are important >> but are best done in their own separate threads. >> >>> >> >>> >> >>> >> >>> >> >