Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Alan Gates Mon, 18 May 2015 11:46:45 -0700

Edward Capriolo <mailto:edlinuxg...@gmail.com>
May 18, 2015 at 10:14
This concept of "experimental features" basically translates to "I do not
have the time to care about people not using my version".

No, it does not. Continuing to support old features is a cost/benefittrade off, both for developers and users. The cost for developers iscontinuing to work around older code, the cost for users that they getless new features, less performance improvements, less stabilityimprovements because developers are spending time working around the oldcode.

At some point in the cost/benefit analysis the costs are high enoughthat it makes sense to stop supporting it. I am asserting that we areat that point.

Caring about people not on the latest version is an important part ofwhat I am proposing. There are still many users using Hive either onHadoop 1 or for more traditional Hive workloads (batch, ETL). It isimportant to give these users a good path forward. My assertion is thata branch-1 is the best way to do this.

So to continue in the cost/benefit paradigm, what I have proposed doeshave an additional cost for developers. As I have said in my responsesto Xuefu, I don't think these are too bad, and I assert that they areless than continuing to carry forward older functionality ad infinitum.My intent is that for users who are not interested in new features orworkloads the cost is at or near zero. Customers interested in newerfunctionality will continue to have pay the cost of upgrades, but thatis true anyway.


Alan.

I do not see it
as good. We have seen what happened to upstream hadoop there was this gap
between 0.21 , and ??.....??. No one was clear what the API was (mapred,
new mapreduce), no one know what to link off of cdh?, vanilla?, yahoo
distribution?.

IMHO. This is just going to increase fragmentation.

On Mon, May 18, 2015 at 1:04 PM, Edward Capriolo <edlinuxg...@gmail.com>

Edward Capriolo <mailto:edlinuxg...@gmail.com>
May 18, 2015 at 10:04
Up until recently Hive supported numerous versions of Hadoop code basewith
a simple shim layer. I would rather we stick to the shim layer. I think
this was easily the best part about hive was that a single release worked
well regardless of your hadoop version. It was also a key element tohive's
success. I do not want to see us have multiple branches.


Xuefu Zhang <mailto:xzh...@cloudera.com>
May 15, 2015 at 22:29
Thanks for the explanation, Alan!
While I have understood more on the proposal, I actually see moreproblems than the confusion of two lines of releases. Essentially,this proposal forces a user to make a hard choice between a stabler,legacy-aware release line and an adventurous, pioneering release line.And once the choice is made, there is no easy way back or forward.
Here is my interpretation. Let's say we have two main branches asproposed. I develop a new feature which I think useful for bothbranches. So, I commit it to both branches. My feature requiresadditional schema support, so I provide upgrade scripts for bothbranches. The scripts are different because the two branches havealready diverged in schema.
Now the two branches evolve in a diverging fashion like this. This isall good as long as a user stays in his line. The moment the userconsiders a switch, mostly likely, from branch-1 to branch-2, he isstuck. Why? Because there is no upgrade path from a release inbranch-1 to a release in branch-2!
If we want to provide an upgrade path, then there will be MxN paths,where M and N are the number of releases in the two branches,respectively. This is going to be next to a nightmare, not only forusers, but also for us.
Also, the proposal will require two sets of things that Hive provides:double documentation, double feature tracking, double build/testinfrastructures, etc.
This approach can also potentially cause the problem we saw in hadoopreleases, where 0.23 release was greater than 1.0 release.
To me, the problem we are trying to solve is deprecating old thingssuch hadoop-1, Hive CLI, etc. This a valid problem to be solved. As Isee, however, we approached the problem in less favorable ways.
First, it seemed we wanted to deprecate something just for the sake ofdeprecation, and it's not based on the rationale that supports thedesire. Dev might write code that accidentally break hadoop-1 build.However, this is more a build infrastructure problem rather than theburden of supporting hadoop-1. If our build could catch it atprecommit test, then I would think the accident can be well avoided.Most of the times, fixing the build is trivial. And we have alreadyaddressed the build infrastructure problem.
Secondly, if we do have a strong reason to deprecate something, weshould have a deprecation plan rather than declaring on the spot thatthe current release is the last one supporting X. I think Microsoftdid a better job in terms production deprecation. For instance, theyannounced long before the last day desupporting Windows XP. In myopinion, we should have a similar vision, giving users, distributionsenough time to adjust rather than shocking them with breaking news.
In summary, I do see the need of deprecation in Hive, but I am afraidthe way we take, including the proposal here, isn't going to nicelysolve the problem. On the contrary, I foresee a spectrum of confusion,frustration, and burden for the user as well as for developers.
Thanks,
Xuefu


Xuefu Zhang <mailto:xzh...@cloudera.com>
May 15, 2015 at 17:31
Just make sure that I understand the proposal correctly: we are going to
have two main branches, one for hadoop-1 and one for hadoop-2. New features
are only merged to branch-2. That essentially says we stop development for
hadoop-1, right? Are we also making two lines of releases: ene for branch-1
and one for branch-2? Won't that be confusing and also burdensome if we
release say 1.3, 2.0, 2.1, 1.4...

Please note that we will have hadoop 3 soon. What's the story there?

Thanks,
Xuefu



On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta<vgumas...@hortonworks.com
wrote:
  +1 on the new branch. I think it’ll help in faster dev time for these
important changes.

  —Vaibhav

   From: Alan Gates<alanfga...@gmail.com>
Reply-To: "dev@hive.apache.org"<dev@hive.apache.org>
Date: Friday, May 15, 2015 at 4:11 PM
To: "dev@hive.apache.org"<dev@hive.apache.org>
Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features

  Anyone else have feedback on this?  If not I'll start a vote next week.

Alan.

    Gopal Vijayaraghavan<gop...@apache.org>
May 14, 2015 at 10:44
   Hi,

+1 on the idea.

Having a stable release branch with ongoing fixes where we do not drop
major features would be good all around.

It lets us accelerate the pace of development, drop major features or
rewrite them entirely without dragging everyone else kicking&  screaming
into that release.

Cheers,
Gopal



    Sergey Shelukhin<ser...@hortonworks.com>
May 11, 2015 at 19:17
   That sounds like a good idea.
Some features could be back ported to branch-1 if viable, but at least new
stuff would not be burdened by Hadoop 1/MR code paths.
Probably also a good place to enable vectorization and other perf features
by default while we make alpha releases.

+1


    Alan Gates<alanfga...@gmail.com>
May 11, 2015 at 15:38
   There is a lot of forward-looking work going on in various branches of
Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It would
be good to have a way to release this code to users so that they can
experiment with it.  Releasing it will also provide feedback to developers.

At the same time there are discussions on whether to keep supporting
Hadoop-1.  The burden of supporting older, less used functionality such as
Hadoop-1 is becoming ever harder as many new features are added.

I propose that the best way to deal with this would be to make a
branch-1.  We could continue to make new feature releases off of this
branch (1.3, 1.4, etc.).  This branch would not drop old functionality.
This provides stability and continuity for users and developers.

We could then merge these new features branches (LLAP, HBase metastore,
CLI drop) into the trunk, as well as turn on by default newer features such
as the vectorization and ACID.  We could also drop older, less used
features such as support for Hadoop-1 and MapReduce.  It will be a while
before we are ready to make stable, production ready releases of this
code.  But we could start making alpha quality releases soon.  We would
call these releases 2.x, to stress the non-backward compatible changes such
as dropping Hadoop-1.  This will give users a chance to play with the new
code and developers a chance to get feedback.

Thoughts?
Vaibhav Gumashta <mailto:vgumas...@hortonworks.com>
May 15, 2015 at 16:43
+1 on the new branch. I think it’ll help in faster dev time for theseimportant changes.
—Vaibhav

From: Alan Gates <alanfga...@gmail.com <mailto:alanfga...@gmail.com>>
Reply-To: "dev@hive.apache.org <mailto:dev@hive.apache.org>"<dev@hive.apache.org <mailto:dev@hive.apache.org>>
Date: Friday, May 15, 2015 at 4:11 PM
To: "dev@hive.apache.org <mailto:dev@hive.apache.org>"<dev@hive.apache.org <mailto:dev@hive.apache.org>>
Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Anyone else have feedback on this?  If not I'll start a vote next week.

Alan.

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Reply via email to