Re: A proposal for Spark 2.0

Koert Kuipers Thu, 03 Dec 2015 12:28:44 -0800

spark 1.x has been supporting scala 2.11 for 3 or 4 releases now. seems to
me you already provide a clear upgrade path: get on scala 2.11 before
upgrading to spark 2.x


from scala team when scala 2.10.6 came out:
We strongly encourage you to upgrade to the latest stable version of Scala
2.11.x, as the 2.10.x series is no longer actively maintained.





On Thu, Dec 3, 2015 at 1:03 PM, Mark Hamstra <m...@clearstorydata.com>
wrote:

> Reynold's post fromNov. 25:
>
> I don't think we should drop support for Scala 2.10, or make it harder in
>> terms of operations for people to upgrade.
>>
>> If there are further objections, I'm going to bump remove the 1.7 version
>> and retarget things to 2.0 on JIRA.
>>
>
> On Thu, Dec 3, 2015 at 12:47 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> Reynold, did you (or someone else) delete version 1.7.0 in JIRA? I
>> think that's premature. If there's a 1.7.0 then we've lost info about
>> what it would contain. It's trivial at any later point to merge the
>> versions. And, since things change and there's not a pressing need to
>> decide one way or the other, it seems fine to at least collect this
>> info like we have things like "1.4.3" that may never be released. I'd
>> like to add it back?
>>
>> On Thu, Nov 26, 2015 at 9:45 AM, Sean Owen <so...@cloudera.com> wrote:
>> > Maintaining both a 1.7 and 2.0 is too much work for the project, which
>> > is over-stretched now. This means that after 1.6 it's just small
>> > maintenance releases in 1.x and no substantial features or evolution.
>> > This means that the "in progress" APIs in 1.x that will stay that way,
>> > unless one updates to 2.x. It's not unreasonable, but means the update
>> > to the 2.x line isn't going to be that optional for users.
>> >
>> > Scala 2.10 is already EOL right? Supporting it in 2.x means supporting
>> > it for a couple years, note. 2.10 is still used today, but that's the
>> > point of the current stable 1.x release in general: if you want to
>> > stick to current dependencies, stick to the current release. Although
>> > I think that's the right way to think about support across major
>> > versions in general, I can see that 2.x is more of a required update
>> > for those following the project's fixes and releases. Hence may indeed
>> > be important to just keep supporting 2.10.
>> >
>> > I can't see supporting 2.12 at the same time (right?). Is that a
>> > concern? it will be long since GA by the time 2.x is first released.
>> >
>> > There's another fairly coherent worldview where development continues
>> > in 1.7 and focuses on finishing the loose ends and lots of bug fixing.
>> > 2.0 is delayed somewhat into next year, and by that time supporting
>> > 2.11+2.12 and Java 8 looks more feasible and more in tune with
>> > currently deployed versions.
>> >
>> > I can't say I have a strong view but I personally hadn't imagined 2.x
>> > would start now.
>> >
>> >
>> > On Thu, Nov 26, 2015 at 7:00 AM, Reynold Xin <r...@databricks.com>
>> wrote:
>> >> I don't think we should drop support for Scala 2.10, or make it harder
>> in
>> >> terms of operations for people to upgrade.
>> >>
>> >> If there are further objections, I'm going to bump remove the 1.7
>> version
>> >> and retarget things to 2.0 on JIRA.
>> >>
>> >>
>> >> On Wed, Nov 25, 2015 at 12:54 AM, Sandy Ryza <sandy.r...@cloudera.com>
>> >> wrote:
>> >>>
>> >>> I see.  My concern is / was that cluster operators will be reluctant
>> to
>> >>> upgrade to 2.0, meaning that developers using those clusters need to
>> stay on
>> >>> 1.x, and, if they want to move to DataFrames, essentially need to
>> port their
>> >>> app twice.
>> >>>
>> >>> I misunderstood and thought part of the proposal was to drop support
>> for
>> >>> 2.10 though.  If your broad point is that there aren't changes in 2.0
>> that
>> >>> will make it less palatable to cluster administrators than releases
>> in the
>> >>> 1.x line, then yes, 2.0 as the next release sounds fine to me.
>> >>>
>> >>> -Sandy
>> >>>
>> >>>
>> >>> On Tue, Nov 24, 2015 at 11:55 AM, Matei Zaharia <
>> matei.zaha...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> What are the other breaking changes in 2.0 though? Note that we're
>> not
>> >>>> removing Scala 2.10, we're just making the default build be against
>> Scala
>> >>>> 2.11 instead of 2.10. There seem to be very few changes that people
>> would
>> >>>> worry about. If people are going to update their apps, I think it's
>> better
>> >>>> to make the other small changes in 2.0 at the same time than to
>> update once
>> >>>> for Dataset and another time for 2.0.
>> >>>>
>> >>>> BTW just refer to Reynold's original post for the other proposed API
>> >>>> changes.
>> >>>>
>> >>>> Matei
>> >>>>
>> >>>> On Nov 24, 2015, at 12:27 PM, Sandy Ryza <sandy.r...@cloudera.com>
>> wrote:
>> >>>>
>> >>>> I think that Kostas' logic still holds.  The majority of Spark
>> users, and
>> >>>> likely an even vaster majority of people running vaster jobs, are
>> still on
>> >>>> RDDs and on the cusp of upgrading to DataFrames.  Users will
>> probably want
>> >>>> to upgrade to the stable version of the Dataset / DataFrame API so
>> they
>> >>>> don't need to do so twice.  Requiring that they absorb all the other
>> ways
>> >>>> that Spark breaks compatibility in the move to 2.0 makes it much more
>> >>>> difficult for them to make this transition.
>> >>>>
>> >>>> Using the same set of APIs also means that it will be easier to
>> backport
>> >>>> critical fixes to the 1.x line.
>> >>>>
>> >>>> It's not clear to me that avoiding breakage of an experimental API
>> in the
>> >>>> 1.x line outweighs these issues.
>> >>>>
>> >>>> -Sandy
>> >>>>
>> >>>> On Mon, Nov 23, 2015 at 10:51 PM, Reynold Xin <r...@databricks.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> I actually think the next one (after 1.6) should be Spark 2.0. The
>> >>>>> reason is that I already know we have to break some part of the
>> >>>>> DataFrame/Dataset API as part of the Dataset design. (e.g.
>> DataFrame.map
>> >>>>> should return Dataset rather than RDD). In that case, I'd rather
>> break this
>> >>>>> sooner (in one release) than later (in two releases). so the damage
>> is
>> >>>>> smaller.
>> >>>>>
>> >>>>> I don't think whether we call Dataset/DataFrame experimental or not
>> >>>>> matters too much for 2.0. We can still call Dataset experimental in
>> 2.0 and
>> >>>>> then mark them as stable in 2.1. Despite being "experimental",
>> there has
>> >>>>> been no breaking changes to DataFrame from 1.3 to 1.6.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Wed, Nov 18, 2015 at 3:43 PM, Mark Hamstra <
>> m...@clearstorydata.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Ah, got it; by "stabilize" you meant changing the API, not just bug
>> >>>>>> fixing.  We're on the same page now.
>> >>>>>>
>> >>>>>> On Wed, Nov 18, 2015 at 3:39 PM, Kostas Sakellis <
>> kos...@cloudera.com>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> A 1.6.x release will only fix bugs - we typically don't change
>> APIs in
>> >>>>>>> z releases. The Dataset API is experimental and so we might be
>> changing the
>> >>>>>>> APIs before we declare it stable. This is why I think it is
>> important to
>> >>>>>>> first stabilize the Dataset API with a Spark 1.7 release before
>> moving to
>> >>>>>>> Spark 2.0. This will benefit users that would like to use the new
>> Dataset
>> >>>>>>> APIs but can't move to Spark 2.0 because of the backwards
>> incompatible
>> >>>>>>> changes, like removal of deprecated APIs, Scala 2.11 etc.
>> >>>>>>>
>> >>>>>>> Kostas
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Fri, Nov 13, 2015 at 12:26 PM, Mark Hamstra
>> >>>>>>> <m...@clearstorydata.com> wrote:
>> >>>>>>>>
>> >>>>>>>> Why does stabilization of those two features require a 1.7
>> release
>> >>>>>>>> instead of 1.6.1?
>> >>>>>>>>
>> >>>>>>>> On Fri, Nov 13, 2015 at 11:40 AM, Kostas Sakellis
>> >>>>>>>> <kos...@cloudera.com> wrote:
>> >>>>>>>>>
>> >>>>>>>>> We have veered off the topic of Spark 2.0 a little bit here -
>> yes we
>> >>>>>>>>> can talk about RDD vs. DS/DF more but lets refocus on Spark
>> 2.0. I'd like to
>> >>>>>>>>> propose we have one more 1.x release after Spark 1.6. This will
>> allow us to
>> >>>>>>>>> stabilize a few of the new features that were added in 1.6:
>> >>>>>>>>>
>> >>>>>>>>> 1) the experimental Datasets API
>> >>>>>>>>> 2) the new unified memory manager.
>> >>>>>>>>>
>> >>>>>>>>> I understand our goal for Spark 2.0 is to offer an easy
>> transition
>> >>>>>>>>> but there will be users that won't be able to seamlessly
>> upgrade given what
>> >>>>>>>>> we have discussed as in scope for 2.0. For these users, having
>> a 1.x release
>> >>>>>>>>> with these new features/APIs stabilized will be very
>> beneficial. This might
>> >>>>>>>>> make Spark 1.7 a lighter release but that is not necessarily a
>> bad thing.
>> >>>>>>>>>
>> >>>>>>>>> Any thoughts on this timeline?
>> >>>>>>>>>
>> >>>>>>>>> Kostas Sakellis
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Thu, Nov 12, 2015 at 8:39 PM, Cheng, Hao <
>> hao.ch...@intel.com>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> Agree, more features/apis/optimization need to be added in
>> DF/DS.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> I mean, we need to think about what kind of RDD APIs we have to
>> >>>>>>>>>> provide to developer, maybe the fundamental API is enough,
>> like, the
>> >>>>>>>>>> ShuffledRDD etc..  But PairRDDFunctions probably not in this
>> category, as we
>> >>>>>>>>>> can do the same thing easily with DF/DS, even better
>> performance.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> From: Mark Hamstra [mailto:m...@clearstorydata.com]
>> >>>>>>>>>> Sent: Friday, November 13, 2015 11:23 AM
>> >>>>>>>>>> To: Stephen Boesch
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Cc: dev@spark.apache.org
>> >>>>>>>>>> Subject: Re: A proposal for Spark 2.0
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Hmmm... to me, that seems like precisely the kind of thing that
>> >>>>>>>>>> argues for retaining the RDD API but not as the first thing
>> presented to new
>> >>>>>>>>>> Spark developers: "Here's how to use groupBy with
>> DataFrames.... Until the
>> >>>>>>>>>> optimizer is more fully developed, that won't always get you
>> the best
>> >>>>>>>>>> performance that can be obtained.  In these particular
>> circumstances, ...,
>> >>>>>>>>>> you may want to use the low-level RDD API while setting
>> >>>>>>>>>> preservesPartitioning to true.  Like this...."
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, Nov 12, 2015 at 7:05 PM, Stephen Boesch <
>> java...@gmail.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> My understanding is that  the RDD's presently have more
>> support for
>> >>>>>>>>>> complete control of partitioning which is a key consideration
>> at scale.
>> >>>>>>>>>> While partitioning control is still piecemeal in  DF/DS  it
>> would seem
>> >>>>>>>>>> premature to make RDD's a second-tier approach to spark dev.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> An example is the use of groupBy when we know that the source
>> >>>>>>>>>> relation (/RDD) is already partitioned on the grouping
>> expressions.  AFAIK
>> >>>>>>>>>> the spark sql still does not allow that knowledge to be
>> applied to the
>> >>>>>>>>>> optimizer - so a full shuffle will be performed. However in
>> the native RDD
>> >>>>>>>>>> we can use preservesPartitioning=true.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> 2015-11-12 17:42 GMT-08:00 Mark Hamstra <
>> m...@clearstorydata.com>:
>> >>>>>>>>>>
>> >>>>>>>>>> The place of the RDD API in 2.0 is also something I've been
>> >>>>>>>>>> wondering about.  I think it may be going too far to deprecate
>> it, but
>> >>>>>>>>>> changing emphasis is something that we might consider.  The
>> RDD API came
>> >>>>>>>>>> well before DataFrames and DataSets, so programming guides,
>> introductory
>> >>>>>>>>>> how-to articles and the like have, to this point, also tended
>> to emphasize
>> >>>>>>>>>> RDDs -- or at least to deal with them early.  What I'm
>> thinking is that with
>> >>>>>>>>>> 2.0 maybe we should overhaul all the documentation to
>> de-emphasize and
>> >>>>>>>>>> reposition RDDs.  In this scheme, DataFrames and DataSets
>> would be
>> >>>>>>>>>> introduced and fully addressed before RDDs.  They would be
>> presented as the
>> >>>>>>>>>> normal/default/standard way to do things in Spark.  RDDs, in
>> contrast, would
>> >>>>>>>>>> be presented later as a kind of lower-level,
>> closer-to-the-metal API that
>> >>>>>>>>>> can be used in atypical, more specialized contexts where
>> DataFrames or
>> >>>>>>>>>> DataSets don't fully fit.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, Nov 12, 2015 at 5:17 PM, Cheng, Hao <
>> hao.ch...@intel.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> I am not sure what the best practice for this specific
>> problem, but
>> >>>>>>>>>> it’s really worth to think about it in 2.0, as it is a painful
>> issue for
>> >>>>>>>>>> lots of users.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> By the way, is it also an opportunity to deprecate the RDD API
>> (or
>> >>>>>>>>>> internal API only?)? As lots of its functionality overlapping
>> with DataFrame
>> >>>>>>>>>> or DataSet.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Hao
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> From: Kostas Sakellis [mailto:kos...@cloudera.com]
>> >>>>>>>>>> Sent: Friday, November 13, 2015 5:27 AM
>> >>>>>>>>>> To: Nicholas Chammas
>> >>>>>>>>>> Cc: Ulanov, Alexander; Nan Zhu; wi...@qq.com;
>> dev@spark.apache.org;
>> >>>>>>>>>> Reynold Xin
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Subject: Re: A proposal for Spark 2.0
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> I know we want to keep breaking changes to a minimum but I'm
>> hoping
>> >>>>>>>>>> that with Spark 2.0 we can also look at better classpath
>> isolation with user
>> >>>>>>>>>> programs. I propose we build on
>> spark.{driver|executor}.userClassPathFirst,
>> >>>>>>>>>> setting it true by default, and not allow any spark transitive
>> dependencies
>> >>>>>>>>>> to leak into user code. For backwards compatibility we can
>> have a whitelist
>> >>>>>>>>>> if we want but I'd be good if we start requiring user apps to
>> explicitly
>> >>>>>>>>>> pull in all their dependencies. From what I can tell, Hadoop 3
>> is also
>> >>>>>>>>>> moving in this direction.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Kostas
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, Nov 12, 2015 at 9:56 AM, Nicholas Chammas
>> >>>>>>>>>> <nicholas.cham...@gmail.com> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> With regards to Machine learning, it would be great to move
>> useful
>> >>>>>>>>>> features from MLlib to ML and deprecate the former. Current
>> structure of two
>> >>>>>>>>>> separate machine learning packages seems to be somewhat
>> confusing.
>> >>>>>>>>>>
>> >>>>>>>>>> With regards to GraphX, it would be great to deprecate the use
>> of
>> >>>>>>>>>> RDD in GraphX and switch to Dataframe. This will allow GraphX
>> evolve with
>> >>>>>>>>>> Tungsten.
>> >>>>>>>>>>
>> >>>>>>>>>> On that note of deprecating stuff, it might be good to
>> deprecate
>> >>>>>>>>>> some things in 2.0 without removing or replacing them
>> immediately. That way
>> >>>>>>>>>> 2.0 doesn’t have to wait for everything that we want to
>> deprecate to be
>> >>>>>>>>>> replaced all at once.
>> >>>>>>>>>>
>> >>>>>>>>>> Nick
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, Nov 12, 2015 at 12:45 PM Ulanov, Alexander
>> >>>>>>>>>> <alexander.ula...@hpe.com> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> Parameter Server is a new feature and thus does not match the
>> goal
>> >>>>>>>>>> of 2.0 is “to fix things that are broken in the current API
>> and remove
>> >>>>>>>>>> certain deprecated APIs”. At the same time I would be happy to
>> have that
>> >>>>>>>>>> feature.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> With regards to Machine learning, it would be great to move
>> useful
>> >>>>>>>>>> features from MLlib to ML and deprecate the former. Current
>> structure of two
>> >>>>>>>>>> separate machine learning packages seems to be somewhat
>> confusing.
>> >>>>>>>>>>
>> >>>>>>>>>> With regards to GraphX, it would be great to deprecate the use
>> of
>> >>>>>>>>>> RDD in GraphX and switch to Dataframe. This will allow GraphX
>> evolve with
>> >>>>>>>>>> Tungsten.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Best regards, Alexander
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> From: Nan Zhu [mailto:zhunanmcg...@gmail.com]
>> >>>>>>>>>> Sent: Thursday, November 12, 2015 7:28 AM
>> >>>>>>>>>> To: wi...@qq.com
>> >>>>>>>>>> Cc: dev@spark.apache.org
>> >>>>>>>>>> Subject: Re: A proposal for Spark 2.0
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Being specific to Parameter Server, I think the current
>> agreement
>> >>>>>>>>>> is that PS shall exist as a third-party library instead of a
>> component of
>> >>>>>>>>>> the core code base, isn’t?
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Best,
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> --
>> >>>>>>>>>>
>> >>>>>>>>>> Nan Zhu
>> >>>>>>>>>>
>> >>>>>>>>>> http://codingcat.me
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Thursday, November 12, 2015 at 9:49 AM, wi...@qq.com wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> Who has the idea of machine learning? Spark missing some
>> features
>> >>>>>>>>>> for machine learning, For example, the parameter server.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> 在 2015年11月12日，05:32，Matei Zaharia <matei.zaha...@gmail.com>
>> 写道：
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> I like the idea of popping out Tachyon to an optional
>> component too
>> >>>>>>>>>> to reduce the number of dependencies. In the future, it might
>> even be useful
>> >>>>>>>>>> to do this for Hadoop, but it requires too many API changes to
>> be worth
>> >>>>>>>>>> doing now.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Regarding Scala 2.12, we should definitely support it
>> eventually,
>> >>>>>>>>>> but I don't think we need to block 2.0 on that because it can
>> be added later
>> >>>>>>>>>> too. Has anyone investigated what it would take to run on
>> there? I imagine
>> >>>>>>>>>> we don't need many code changes, just maybe some REPL stuff.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Needless to say, but I'm all for the idea of making "major"
>> >>>>>>>>>> releases as undisruptive as possible in the model Reynold
>> proposed. Keeping
>> >>>>>>>>>> everyone working with the same set of releases is super
>> important.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Matei
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Nov 11, 2015, at 4:58 AM, Sean Owen <so...@cloudera.com>
>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Wed, Nov 11, 2015 at 12:10 AM, Reynold Xin <
>> r...@databricks.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> to the Spark community. A major release should not be very
>> >>>>>>>>>> different from a
>> >>>>>>>>>>
>> >>>>>>>>>> minor release and should not be gated based on new features.
>> The
>> >>>>>>>>>> main
>> >>>>>>>>>>
>> >>>>>>>>>> purpose of a major release is an opportunity to fix things
>> that are
>> >>>>>>>>>> broken
>> >>>>>>>>>>
>> >>>>>>>>>> in the current API and remove certain deprecated APIs (examples
>> >>>>>>>>>> follow).
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Agree with this stance. Generally, a major release might also
>> be a
>> >>>>>>>>>>
>> >>>>>>>>>> time to replace some big old API or implementation with a new
>> one,
>> >>>>>>>>>> but
>> >>>>>>>>>>
>> >>>>>>>>>> I don't see obvious candidates.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> I wouldn't mind turning attention to 2.x sooner than later,
>> unless
>> >>>>>>>>>>
>> >>>>>>>>>> there's a fairly good reason to continue adding features in
>> 1.x to
>> >>>>>>>>>> a
>> >>>>>>>>>>
>> >>>>>>>>>> 1.7 release. The scope as of 1.6 is already pretty darned big.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> 1. Scala 2.11 as the default build. We should still support
>> Scala
>> >>>>>>>>>> 2.10, but
>> >>>>>>>>>>
>> >>>>>>>>>> it has been end-of-life.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> By the time 2.x rolls around, 2.12 will be the main version,
>> 2.11
>> >>>>>>>>>> will
>> >>>>>>>>>>
>> >>>>>>>>>> be quite stable, and 2.10 will have been EOL for a while. I'd
>> >>>>>>>>>> propose
>> >>>>>>>>>>
>> >>>>>>>>>> dropping 2.10. Otherwise it's supported for 2 more years.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> 2. Remove Hadoop 1 support.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> I'd go further to drop support for <2.2 for sure (2.0 and 2.1
>> were
>> >>>>>>>>>>
>> >>>>>>>>>> sort of 'alpha' and 'beta' releases) and even <2.6.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> I'm sure we'll think of a number of other small things --
>> shading a
>> >>>>>>>>>>
>> >>>>>>>>>> bunch of stuff? reviewing and updating dependencies in light of
>> >>>>>>>>>>
>> >>>>>>>>>> simpler, more recent dependencies to support from Hadoop etc?
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Farming out Tachyon to a module? (I felt like someone proposed
>> >>>>>>>>>> this?)
>> >>>>>>>>>>
>> >>>>>>>>>> Pop out any Docker stuff to another repo?
>> >>>>>>>>>>
>> >>>>>>>>>> Continue that same effort for EC2?
>> >>>>>>>>>>
>> >>>>>>>>>> Farming out some of the "external" integrations to another
>> repo (?
>> >>>>>>>>>>
>> >>>>>>>>>> controversial)
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> See also anything marked version "2+" in JIRA.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> ---------------------------------------------------------------------
>> >>>>>>>>>>
>> >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> >>>>>>>>>>
>> >>>>>>>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> ---------------------------------------------------------------------
>> >>>>>>>>>>
>> >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> >>>>>>>>>>
>> >>>>>>>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> ---------------------------------------------------------------------
>> >>>>>>>>>>
>> >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> >>>>>>>>>>
>> >>>>>>>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>

Re: A proposal for Spark 2.0

Reply via email to