Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Arun Murthy Mon, 09 Mar 2015 15:56:19 -0700

Colin,

 Do you have a list of incompatible changes other than the shell-script 
rewrite? If we do have others we'd have to fix them anyway for the current plan 
on hadoop-3.x right? So, I don't see the difference?


Arun

________________________________________
From: Colin P. McCabe <cmcc...@apache.org>
Sent: Monday, March 09, 2015 3:05 PM
To: hdfs-dev@hadoop.apache.org
Cc: mapreduce-...@hadoop.apache.org; common-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Java 7 will be end-of-lifed in April 2015.  I think it would be unwise
to plan a new Hadoop release against a version of Java that is almost
obsolete and (soon) no longer receiving security updates.  I think
people will be willing to roll out a new version of Java for Hadoop
3.x.

Similarly, the whole point of bumping the major version number is the
ability to make incompatible changes.  There are already a bunch of
incompatible changes in the trunk branch.  Are you proposing to revert
those?  Or push them into newly created feature branches?  This
doesn't seem like a good idea to me.

I would be in favor of backporting targetted incompatible changes from
trunk to branch-2.  For example, we could consider pulling in Allen's
shell script rewrite.  But pulling in all of trunk seems like a bad
idea at this point, if we want a 2.x release.

best,
Colin

On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran <ste...@hortonworks.com> wrote:
>
> If 3.x is going to be Java 8 & not backwards compatible, I don't expect 
> anyone wanting to use this in production until some time deep into 2016.
>
> Issue: JDK 8 vs 7
>
> It will require Hadoop clusters to move up to Java 8. While there's dev pull 
> for this, there's ops pull against this: people are still in the moving-off 
> Java 6 phase due to that "it's working, don't update it" philosophy. Java 8 
> is compelling to us coders, but that doesn't mean ops want it.
>
> You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the 
> main thing is setting up JAVA_HOME. That's something we could make easier 
> somehow (maybe some min Java version field in resource requests that will let 
> apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could 
> fail-fast if a Java version wasn't available.
>
> What we can't do in hadoop coretoday  is set javac.version=1.8 & use java 8 
> code. Downstream code ca do that (Hive, etc); they just need to accept that 
> they don't get to play on JDK7 clusters if they embrace l-expressions.
>
> So...we need to stay on java 7 for some time due to ops pull; downstream apps 
> get to choose what they want. We can/could enhance YARN to make JVM choice 
> more declarative.
>
> Issue: Incompatible changes
>
> Without knowing what is proposed for "an incompatible classpath change", I 
> can't say whether this is something that could be made optional. If it isn't, 
> then it is a python-3 class option, "rewrite your code" event, which is going 
> to be particularly traumatic to things like Hive that already do complex CP 
> games. I'm currently against any mandatory change here, though would love to 
> see an optional one. And if optional, it ceases to become an incompatible 
> change...
>
> Issue: Getting trunk out the door
>
> The main diff from branch-2 and trunk is currently the bash script changes. 
> These don't break client apps. May or may not break bigtop & other downstream 
> hadoop stacks, but developers don't need to worry about this:  no 
> recompilation necessary
>
> Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code.
>
> It seems to me that I could go
>
> git checkout trunk
>         mvn versions:set -DnewVersion=2.8.0-SNAPSHOT
>
> We'd then have a version of Hadoop-trunk we could ship later this year, 
> compatible at the JDK and API level with the existing java code & JDK7+ 
> clusters.
>
> A classpath fix that is optional/compatible can then go out on the 2.x line, 
> saving the 3.x tag for something that really breaks things, forces all 
> downstream apps to set up new hadoop profiles, have separate modules & 
> generally hate the hadoop dev team
>
> This lets us tick off the "recent trunk release" and "fixed shell scripts" 
> items, pushing out those benefits to people sooner rather than later, and 
> puts off the "Hello, we've just broken your code" event for another 12+ 
> months.
>
> Comments?
>
> -Steve
>
>
>

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

Reply via email to