spark 1.x has been supporting scala 2.11 for 3 or 4 releases now. seems to me you already provide a clear upgrade path: get on scala 2.11 before upgrading to spark 2.x
from scala team when scala 2.10.6 came out: We strongly encourage you to upgrade to the latest stable version of Scala 2.11.x, as the 2.10.x series is no longer actively maintained. On Thu, Dec 3, 2015 at 1:03 PM, Mark Hamstra <m...@clearstorydata.com> wrote: > Reynold's post fromNov. 25: > > I don't think we should drop support for Scala 2.10, or make it harder in >> terms of operations for people to upgrade. >> >> If there are further objections, I'm going to bump remove the 1.7 version >> and retarget things to 2.0 on JIRA. >> > > On Thu, Dec 3, 2015 at 12:47 AM, Sean Owen <so...@cloudera.com> wrote: > >> Reynold, did you (or someone else) delete version 1.7.0 in JIRA? I >> think that's premature. If there's a 1.7.0 then we've lost info about >> what it would contain. It's trivial at any later point to merge the >> versions. And, since things change and there's not a pressing need to >> decide one way or the other, it seems fine to at least collect this >> info like we have things like "1.4.3" that may never be released. I'd >> like to add it back? >> >> On Thu, Nov 26, 2015 at 9:45 AM, Sean Owen <so...@cloudera.com> wrote: >> > Maintaining both a 1.7 and 2.0 is too much work for the project, which >> > is over-stretched now. This means that after 1.6 it's just small >> > maintenance releases in 1.x and no substantial features or evolution. >> > This means that the "in progress" APIs in 1.x that will stay that way, >> > unless one updates to 2.x. It's not unreasonable, but means the update >> > to the 2.x line isn't going to be that optional for users. >> > >> > Scala 2.10 is already EOL right? Supporting it in 2.x means supporting >> > it for a couple years, note. 2.10 is still used today, but that's the >> > point of the current stable 1.x release in general: if you want to >> > stick to current dependencies, stick to the current release. Although >> > I think that's the right way to think about support across major >> > versions in general, I can see that 2.x is more of a required update >> > for those following the project's fixes and releases. Hence may indeed >> > be important to just keep supporting 2.10. >> > >> > I can't see supporting 2.12 at the same time (right?). Is that a >> > concern? it will be long since GA by the time 2.x is first released. >> > >> > There's another fairly coherent worldview where development continues >> > in 1.7 and focuses on finishing the loose ends and lots of bug fixing. >> > 2.0 is delayed somewhat into next year, and by that time supporting >> > 2.11+2.12 and Java 8 looks more feasible and more in tune with >> > currently deployed versions. >> > >> > I can't say I have a strong view but I personally hadn't imagined 2.x >> > would start now. >> > >> > >> > On Thu, Nov 26, 2015 at 7:00 AM, Reynold Xin <r...@databricks.com> >> wrote: >> >> I don't think we should drop support for Scala 2.10, or make it harder >> in >> >> terms of operations for people to upgrade. >> >> >> >> If there are further objections, I'm going to bump remove the 1.7 >> version >> >> and retarget things to 2.0 on JIRA. >> >> >> >> >> >> On Wed, Nov 25, 2015 at 12:54 AM, Sandy Ryza <sandy.r...@cloudera.com> >> >> wrote: >> >>> >> >>> I see. My concern is / was that cluster operators will be reluctant >> to >> >>> upgrade to 2.0, meaning that developers using those clusters need to >> stay on >> >>> 1.x, and, if they want to move to DataFrames, essentially need to >> port their >> >>> app twice. >> >>> >> >>> I misunderstood and thought part of the proposal was to drop support >> for >> >>> 2.10 though. If your broad point is that there aren't changes in 2.0 >> that >> >>> will make it less palatable to cluster administrators than releases >> in the >> >>> 1.x line, then yes, 2.0 as the next release sounds fine to me. >> >>> >> >>> -Sandy >> >>> >> >>> >> >>> On Tue, Nov 24, 2015 at 11:55 AM, Matei Zaharia < >> matei.zaha...@gmail.com> >> >>> wrote: >> >>>> >> >>>> What are the other breaking changes in 2.0 though? Note that we're >> not >> >>>> removing Scala 2.10, we're just making the default build be against >> Scala >> >>>> 2.11 instead of 2.10. There seem to be very few changes that people >> would >> >>>> worry about. If people are going to update their apps, I think it's >> better >> >>>> to make the other small changes in 2.0 at the same time than to >> update once >> >>>> for Dataset and another time for 2.0. >> >>>> >> >>>> BTW just refer to Reynold's original post for the other proposed API >> >>>> changes. >> >>>> >> >>>> Matei >> >>>> >> >>>> On Nov 24, 2015, at 12:27 PM, Sandy Ryza <sandy.r...@cloudera.com> >> wrote: >> >>>> >> >>>> I think that Kostas' logic still holds. The majority of Spark >> users, and >> >>>> likely an even vaster majority of people running vaster jobs, are >> still on >> >>>> RDDs and on the cusp of upgrading to DataFrames. Users will >> probably want >> >>>> to upgrade to the stable version of the Dataset / DataFrame API so >> they >> >>>> don't need to do so twice. Requiring that they absorb all the other >> ways >> >>>> that Spark breaks compatibility in the move to 2.0 makes it much more >> >>>> difficult for them to make this transition. >> >>>> >> >>>> Using the same set of APIs also means that it will be easier to >> backport >> >>>> critical fixes to the 1.x line. >> >>>> >> >>>> It's not clear to me that avoiding breakage of an experimental API >> in the >> >>>> 1.x line outweighs these issues. >> >>>> >> >>>> -Sandy >> >>>> >> >>>> On Mon, Nov 23, 2015 at 10:51 PM, Reynold Xin <r...@databricks.com> >> >>>> wrote: >> >>>>> >> >>>>> I actually think the next one (after 1.6) should be Spark 2.0. The >> >>>>> reason is that I already know we have to break some part of the >> >>>>> DataFrame/Dataset API as part of the Dataset design. (e.g. >> DataFrame.map >> >>>>> should return Dataset rather than RDD). In that case, I'd rather >> break this >> >>>>> sooner (in one release) than later (in two releases). so the damage >> is >> >>>>> smaller. >> >>>>> >> >>>>> I don't think whether we call Dataset/DataFrame experimental or not >> >>>>> matters too much for 2.0. We can still call Dataset experimental in >> 2.0 and >> >>>>> then mark them as stable in 2.1. Despite being "experimental", >> there has >> >>>>> been no breaking changes to DataFrame from 1.3 to 1.6. >> >>>>> >> >>>>> >> >>>>> >> >>>>> On Wed, Nov 18, 2015 at 3:43 PM, Mark Hamstra < >> m...@clearstorydata.com> >> >>>>> wrote: >> >>>>>> >> >>>>>> Ah, got it; by "stabilize" you meant changing the API, not just bug >> >>>>>> fixing. We're on the same page now. >> >>>>>> >> >>>>>> On Wed, Nov 18, 2015 at 3:39 PM, Kostas Sakellis < >> kos...@cloudera.com> >> >>>>>> wrote: >> >>>>>>> >> >>>>>>> A 1.6.x release will only fix bugs - we typically don't change >> APIs in >> >>>>>>> z releases. The Dataset API is experimental and so we might be >> changing the >> >>>>>>> APIs before we declare it stable. This is why I think it is >> important to >> >>>>>>> first stabilize the Dataset API with a Spark 1.7 release before >> moving to >> >>>>>>> Spark 2.0. This will benefit users that would like to use the new >> Dataset >> >>>>>>> APIs but can't move to Spark 2.0 because of the backwards >> incompatible >> >>>>>>> changes, like removal of deprecated APIs, Scala 2.11 etc. >> >>>>>>> >> >>>>>>> Kostas >> >>>>>>> >> >>>>>>> >> >>>>>>> On Fri, Nov 13, 2015 at 12:26 PM, Mark Hamstra >> >>>>>>> <m...@clearstorydata.com> wrote: >> >>>>>>>> >> >>>>>>>> Why does stabilization of those two features require a 1.7 >> release >> >>>>>>>> instead of 1.6.1? >> >>>>>>>> >> >>>>>>>> On Fri, Nov 13, 2015 at 11:40 AM, Kostas Sakellis >> >>>>>>>> <kos...@cloudera.com> wrote: >> >>>>>>>>> >> >>>>>>>>> We have veered off the topic of Spark 2.0 a little bit here - >> yes we >> >>>>>>>>> can talk about RDD vs. DS/DF more but lets refocus on Spark >> 2.0. I'd like to >> >>>>>>>>> propose we have one more 1.x release after Spark 1.6. This will >> allow us to >> >>>>>>>>> stabilize a few of the new features that were added in 1.6: >> >>>>>>>>> >> >>>>>>>>> 1) the experimental Datasets API >> >>>>>>>>> 2) the new unified memory manager. >> >>>>>>>>> >> >>>>>>>>> I understand our goal for Spark 2.0 is to offer an easy >> transition >> >>>>>>>>> but there will be users that won't be able to seamlessly >> upgrade given what >> >>>>>>>>> we have discussed as in scope for 2.0. For these users, having >> a 1.x release >> >>>>>>>>> with these new features/APIs stabilized will be very >> beneficial. This might >> >>>>>>>>> make Spark 1.7 a lighter release but that is not necessarily a >> bad thing. >> >>>>>>>>> >> >>>>>>>>> Any thoughts on this timeline? >> >>>>>>>>> >> >>>>>>>>> Kostas Sakellis >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Thu, Nov 12, 2015 at 8:39 PM, Cheng, Hao < >> hao.ch...@intel.com> >> >>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>> Agree, more features/apis/optimization need to be added in >> DF/DS. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> I mean, we need to think about what kind of RDD APIs we have to >> >>>>>>>>>> provide to developer, maybe the fundamental API is enough, >> like, the >> >>>>>>>>>> ShuffledRDD etc.. But PairRDDFunctions probably not in this >> category, as we >> >>>>>>>>>> can do the same thing easily with DF/DS, even better >> performance. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> From: Mark Hamstra [mailto:m...@clearstorydata.com] >> >>>>>>>>>> Sent: Friday, November 13, 2015 11:23 AM >> >>>>>>>>>> To: Stephen Boesch >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Cc: dev@spark.apache.org >> >>>>>>>>>> Subject: Re: A proposal for Spark 2.0 >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Hmmm... to me, that seems like precisely the kind of thing that >> >>>>>>>>>> argues for retaining the RDD API but not as the first thing >> presented to new >> >>>>>>>>>> Spark developers: "Here's how to use groupBy with >> DataFrames.... Until the >> >>>>>>>>>> optimizer is more fully developed, that won't always get you >> the best >> >>>>>>>>>> performance that can be obtained. In these particular >> circumstances, ..., >> >>>>>>>>>> you may want to use the low-level RDD API while setting >> >>>>>>>>>> preservesPartitioning to true. Like this...." >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On Thu, Nov 12, 2015 at 7:05 PM, Stephen Boesch < >> java...@gmail.com> >> >>>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>> My understanding is that the RDD's presently have more >> support for >> >>>>>>>>>> complete control of partitioning which is a key consideration >> at scale. >> >>>>>>>>>> While partitioning control is still piecemeal in DF/DS it >> would seem >> >>>>>>>>>> premature to make RDD's a second-tier approach to spark dev. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> An example is the use of groupBy when we know that the source >> >>>>>>>>>> relation (/RDD) is already partitioned on the grouping >> expressions. AFAIK >> >>>>>>>>>> the spark sql still does not allow that knowledge to be >> applied to the >> >>>>>>>>>> optimizer - so a full shuffle will be performed. However in >> the native RDD >> >>>>>>>>>> we can use preservesPartitioning=true. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> 2015-11-12 17:42 GMT-08:00 Mark Hamstra < >> m...@clearstorydata.com>: >> >>>>>>>>>> >> >>>>>>>>>> The place of the RDD API in 2.0 is also something I've been >> >>>>>>>>>> wondering about. I think it may be going too far to deprecate >> it, but >> >>>>>>>>>> changing emphasis is something that we might consider. The >> RDD API came >> >>>>>>>>>> well before DataFrames and DataSets, so programming guides, >> introductory >> >>>>>>>>>> how-to articles and the like have, to this point, also tended >> to emphasize >> >>>>>>>>>> RDDs -- or at least to deal with them early. What I'm >> thinking is that with >> >>>>>>>>>> 2.0 maybe we should overhaul all the documentation to >> de-emphasize and >> >>>>>>>>>> reposition RDDs. In this scheme, DataFrames and DataSets >> would be >> >>>>>>>>>> introduced and fully addressed before RDDs. They would be >> presented as the >> >>>>>>>>>> normal/default/standard way to do things in Spark. RDDs, in >> contrast, would >> >>>>>>>>>> be presented later as a kind of lower-level, >> closer-to-the-metal API that >> >>>>>>>>>> can be used in atypical, more specialized contexts where >> DataFrames or >> >>>>>>>>>> DataSets don't fully fit. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On Thu, Nov 12, 2015 at 5:17 PM, Cheng, Hao < >> hao.ch...@intel.com> >> >>>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>> I am not sure what the best practice for this specific >> problem, but >> >>>>>>>>>> it’s really worth to think about it in 2.0, as it is a painful >> issue for >> >>>>>>>>>> lots of users. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> By the way, is it also an opportunity to deprecate the RDD API >> (or >> >>>>>>>>>> internal API only?)? As lots of its functionality overlapping >> with DataFrame >> >>>>>>>>>> or DataSet. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Hao >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> From: Kostas Sakellis [mailto:kos...@cloudera.com] >> >>>>>>>>>> Sent: Friday, November 13, 2015 5:27 AM >> >>>>>>>>>> To: Nicholas Chammas >> >>>>>>>>>> Cc: Ulanov, Alexander; Nan Zhu; wi...@qq.com; >> dev@spark.apache.org; >> >>>>>>>>>> Reynold Xin >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Subject: Re: A proposal for Spark 2.0 >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> I know we want to keep breaking changes to a minimum but I'm >> hoping >> >>>>>>>>>> that with Spark 2.0 we can also look at better classpath >> isolation with user >> >>>>>>>>>> programs. I propose we build on >> spark.{driver|executor}.userClassPathFirst, >> >>>>>>>>>> setting it true by default, and not allow any spark transitive >> dependencies >> >>>>>>>>>> to leak into user code. For backwards compatibility we can >> have a whitelist >> >>>>>>>>>> if we want but I'd be good if we start requiring user apps to >> explicitly >> >>>>>>>>>> pull in all their dependencies. From what I can tell, Hadoop 3 >> is also >> >>>>>>>>>> moving in this direction. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Kostas >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On Thu, Nov 12, 2015 at 9:56 AM, Nicholas Chammas >> >>>>>>>>>> <nicholas.cham...@gmail.com> wrote: >> >>>>>>>>>> >> >>>>>>>>>> With regards to Machine learning, it would be great to move >> useful >> >>>>>>>>>> features from MLlib to ML and deprecate the former. Current >> structure of two >> >>>>>>>>>> separate machine learning packages seems to be somewhat >> confusing. >> >>>>>>>>>> >> >>>>>>>>>> With regards to GraphX, it would be great to deprecate the use >> of >> >>>>>>>>>> RDD in GraphX and switch to Dataframe. This will allow GraphX >> evolve with >> >>>>>>>>>> Tungsten. >> >>>>>>>>>> >> >>>>>>>>>> On that note of deprecating stuff, it might be good to >> deprecate >> >>>>>>>>>> some things in 2.0 without removing or replacing them >> immediately. That way >> >>>>>>>>>> 2.0 doesn’t have to wait for everything that we want to >> deprecate to be >> >>>>>>>>>> replaced all at once. >> >>>>>>>>>> >> >>>>>>>>>> Nick >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On Thu, Nov 12, 2015 at 12:45 PM Ulanov, Alexander >> >>>>>>>>>> <alexander.ula...@hpe.com> wrote: >> >>>>>>>>>> >> >>>>>>>>>> Parameter Server is a new feature and thus does not match the >> goal >> >>>>>>>>>> of 2.0 is “to fix things that are broken in the current API >> and remove >> >>>>>>>>>> certain deprecated APIs”. At the same time I would be happy to >> have that >> >>>>>>>>>> feature. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> With regards to Machine learning, it would be great to move >> useful >> >>>>>>>>>> features from MLlib to ML and deprecate the former. Current >> structure of two >> >>>>>>>>>> separate machine learning packages seems to be somewhat >> confusing. >> >>>>>>>>>> >> >>>>>>>>>> With regards to GraphX, it would be great to deprecate the use >> of >> >>>>>>>>>> RDD in GraphX and switch to Dataframe. This will allow GraphX >> evolve with >> >>>>>>>>>> Tungsten. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Best regards, Alexander >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> From: Nan Zhu [mailto:zhunanmcg...@gmail.com] >> >>>>>>>>>> Sent: Thursday, November 12, 2015 7:28 AM >> >>>>>>>>>> To: wi...@qq.com >> >>>>>>>>>> Cc: dev@spark.apache.org >> >>>>>>>>>> Subject: Re: A proposal for Spark 2.0 >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Being specific to Parameter Server, I think the current >> agreement >> >>>>>>>>>> is that PS shall exist as a third-party library instead of a >> component of >> >>>>>>>>>> the core code base, isn’t? >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Best, >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> -- >> >>>>>>>>>> >> >>>>>>>>>> Nan Zhu >> >>>>>>>>>> >> >>>>>>>>>> http://codingcat.me >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On Thursday, November 12, 2015 at 9:49 AM, wi...@qq.com wrote: >> >>>>>>>>>> >> >>>>>>>>>> Who has the idea of machine learning? Spark missing some >> features >> >>>>>>>>>> for machine learning, For example, the parameter server. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> 在 2015年11月12日,05:32,Matei Zaharia <matei.zaha...@gmail.com> >> 写道: >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> I like the idea of popping out Tachyon to an optional >> component too >> >>>>>>>>>> to reduce the number of dependencies. In the future, it might >> even be useful >> >>>>>>>>>> to do this for Hadoop, but it requires too many API changes to >> be worth >> >>>>>>>>>> doing now. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Regarding Scala 2.12, we should definitely support it >> eventually, >> >>>>>>>>>> but I don't think we need to block 2.0 on that because it can >> be added later >> >>>>>>>>>> too. Has anyone investigated what it would take to run on >> there? I imagine >> >>>>>>>>>> we don't need many code changes, just maybe some REPL stuff. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Needless to say, but I'm all for the idea of making "major" >> >>>>>>>>>> releases as undisruptive as possible in the model Reynold >> proposed. Keeping >> >>>>>>>>>> everyone working with the same set of releases is super >> important. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Matei >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On Nov 11, 2015, at 4:58 AM, Sean Owen <so...@cloudera.com> >> wrote: >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On Wed, Nov 11, 2015 at 12:10 AM, Reynold Xin < >> r...@databricks.com> >> >>>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>> to the Spark community. A major release should not be very >> >>>>>>>>>> different from a >> >>>>>>>>>> >> >>>>>>>>>> minor release and should not be gated based on new features. >> The >> >>>>>>>>>> main >> >>>>>>>>>> >> >>>>>>>>>> purpose of a major release is an opportunity to fix things >> that are >> >>>>>>>>>> broken >> >>>>>>>>>> >> >>>>>>>>>> in the current API and remove certain deprecated APIs (examples >> >>>>>>>>>> follow). >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Agree with this stance. Generally, a major release might also >> be a >> >>>>>>>>>> >> >>>>>>>>>> time to replace some big old API or implementation with a new >> one, >> >>>>>>>>>> but >> >>>>>>>>>> >> >>>>>>>>>> I don't see obvious candidates. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> I wouldn't mind turning attention to 2.x sooner than later, >> unless >> >>>>>>>>>> >> >>>>>>>>>> there's a fairly good reason to continue adding features in >> 1.x to >> >>>>>>>>>> a >> >>>>>>>>>> >> >>>>>>>>>> 1.7 release. The scope as of 1.6 is already pretty darned big. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> 1. Scala 2.11 as the default build. We should still support >> Scala >> >>>>>>>>>> 2.10, but >> >>>>>>>>>> >> >>>>>>>>>> it has been end-of-life. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> By the time 2.x rolls around, 2.12 will be the main version, >> 2.11 >> >>>>>>>>>> will >> >>>>>>>>>> >> >>>>>>>>>> be quite stable, and 2.10 will have been EOL for a while. I'd >> >>>>>>>>>> propose >> >>>>>>>>>> >> >>>>>>>>>> dropping 2.10. Otherwise it's supported for 2 more years. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> 2. Remove Hadoop 1 support. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> I'd go further to drop support for <2.2 for sure (2.0 and 2.1 >> were >> >>>>>>>>>> >> >>>>>>>>>> sort of 'alpha' and 'beta' releases) and even <2.6. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> I'm sure we'll think of a number of other small things -- >> shading a >> >>>>>>>>>> >> >>>>>>>>>> bunch of stuff? reviewing and updating dependencies in light of >> >>>>>>>>>> >> >>>>>>>>>> simpler, more recent dependencies to support from Hadoop etc? >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Farming out Tachyon to a module? (I felt like someone proposed >> >>>>>>>>>> this?) >> >>>>>>>>>> >> >>>>>>>>>> Pop out any Docker stuff to another repo? >> >>>>>>>>>> >> >>>>>>>>>> Continue that same effort for EC2? >> >>>>>>>>>> >> >>>>>>>>>> Farming out some of the "external" integrations to another >> repo (? >> >>>>>>>>>> >> >>>>>>>>>> controversial) >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> See also anything marked version "2+" in JIRA. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> --------------------------------------------------------------------- >> >>>>>>>>>> >> >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> >>>>>>>>>> >> >>>>>>>>>> For additional commands, e-mail: dev-h...@spark.apache.org >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> --------------------------------------------------------------------- >> >>>>>>>>>> >> >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> >>>>>>>>>> >> >>>>>>>>>> For additional commands, e-mail: dev-h...@spark.apache.org >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> --------------------------------------------------------------------- >> >>>>>>>>>> >> >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> >>>>>>>>>> >> >>>>>>>>>> For additional commands, e-mail: dev-h...@spark.apache.org >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> >>>> >> >>> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >