Re: [discuss] dropping Hadoop 2.2 and 2.3 support in Spark 2.0?

Steve Loughran Thu, 14 Jan 2016 09:29:37 -0800

> On 14 Jan 2016, at 02:17, Sean Owen <so...@cloudera.com> wrote:
> 
> I personally support this. I had suggest drawing the line at Hadoop
> 2.6, but that's minor. More info:
> 
> Hadoop 2.7: April 2015
> Hadoop 2.6: Nov 2014
> Hadoop 2.5: Aug 2014
> Hadoop 2.4: April 2014
> Hadoop 2.3: Feb 2014
> Hadoop 2.2: Oct 2013
> 
> CDH 5.0/5.1 = Hadoop 2.3 + backports
> CDH 5.2/5.3 = Hadoop 2.5 + backports
> CDH 5.4+ = Hadoop 2.6 + chunks of 2.7 + backports.
> 
> I can only imagine that CDH6 this year will be based on something
> later still like 2.8 (no idea about the 3.0 schedule).


Hadoop 2.8 comes out in ~1-2 months. I've already been building & testing spark 
against it; no major issues.

> In the sense
> that 5.2 was released about a year and half ago, yes, this vendor has
> moved on from 2.3 a while ago. These releases will also never contain
> a different minor Spark release. For example 5.7 will have Spark 1.6,
> I believe, and not 2.0.
> 
> Here, I listed some additional things we could clean up in Spark if
> Hadoop 2.6 was assumed. By itself, not a lot:
> https://github.com/apache/spark/pull/10446#issuecomment-167971026
> 


> Yes, we also get less Jenkins complexity. Mostly, the jar-hell that's
> biting now gets a little more feasible to fix. And we get Hadoop fixes
> as well as new APIs, which helps mostly for YARN.
> 

2.6.x is still having active releases, likely through 2016. It'll be the only 
hadoop version where problems Spark encounters would get fixed

It's also the last iteration of interesting API features —especially in YARN: 
timeline server, registry, various other things

And it has s3a, which, for anyone using S3 for storage, is the only S3 
filesystem binding I'd recommend. Hadoop 2.4 not only has s3n, it's got a 
broken one that (HADOOP-10589)

I believe 2.6 supportsr recent guava versions, even if it is frozen on 11.0 to 
avoid surprising people (i.e. all deprecated/removed classes should have been 
stripped)

Finally: it's the only version of Hadoop that works on Java 7, has patches to 
support Java8+kerberos (in fact, Java 7u80+ and kerberos).

For the reason of JVMs and guava alone, I'd abandon Hadoop < 2.6. Those 
versions won't work on secure Java 7 clusters, recent guava versions, and have 
lots of uncorrected issues.

Oh, and did I mention the test matrix? The later version of Hadoop you use, the 
less versions to test against. 

> My general position is that backwards-compatibility and supporting
> older platforms needs to be a low priority in a major release; it's a
> decision about what to support for users in the next couple years, not
> the preceding couple years. Users on older technologies simply stay on
> the older Spark until ready to update; they are in no sense suddenly
> left behind otherwise.


If they are running older versions of Hadoop, they generally have stable apps 
which they don't bother upgrading. New clusters => new versions => new apps.

Re: [discuss] dropping Hadoop 2.2 and 2.3 support in Spark 2.0?

Reply via email to