Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Alan Gates Fri, 22 May 2015 13:20:06 -0700

I agree with *All* features with the exception that some features mightbe branch-1 specific (if it's a feature on something no longer supportedin master, like hadoop-1). Without this we prevent new features forolder technology, which doesn't strike me as reasonable.

I see your point on saying the contributor may not understand where bestto put the patch, and thus the committer decides. However, it would bevery disappointing for a contributor who uses branch-1 to build a newfeature only to have the committer put it only in master. So I wouldmodify your modification to say "at the discretion of the contributorand Hive committers".


Alan.

[email protected] <mailto:[email protected]>
May 22, 2015 at 11:41
+1 on the new proposal. Feedback below:
> New features must be put into master. Whether to put them intobranch-1 is at the discretion of the developer.
How about we change this to "*_All_* features must be put into master.Whether to put them into branch-1 is at the discretion of the*_committer_*." The reason I think is going forward for us to sustainas a happy and healthy community, it's imperative for us to make itnot only easy for the users, but also for developers and committers tocontribute/commit patches. To me being a hive contributor would behard to determine which branch my code belongs. Also IMO(and I mightbe wrong) but many committers have their own areas of expertise andit's also very hard for them to immediately determine what branch apatch should go to unless very well documented somewhere. Putting allcode into the master would be an easy approach to follow and thencherry picking to other branches can be done. So even if people forgetto do that, we can always go back to master and port the patches outto these branches. So we have a master branch, a branch-1 for stablecode, branch-2 for experimental and "bleeding edge" code and so on.Once branch-2 is stable, we deprecate branch-1, create branch-3 andmove on.
Another reason I say this is because in my experience, a prettysignificant amount of work is hive is still bug fixes and I think thatis what the user cares most about(correctness above anything else). Sowith this approach, might be very obvious to what branches to committhis to.
--
Swarnim
Chris Drome <mailto:[email protected]>
May 22, 2015 at 0:49
I understand the motivation and benefits of creating a branch-2 wheremore disruptive work can go on without affecting branch-1. While notnecessarily against this approach, from Yahoo's standpoint, I do havesome questions (concerns).Upgrading to a new version of Hive requires a significant commitmentof time and resources to stabilize and certify a build for deploymentto our clusters. Given the size of our clusters and scale of datasets,we have to be particularly careful about adopting new functionality.However, at the same time we are interested in new testing and makingavailable new features and functionality. That said, we would have torely on branch-1 for the immediate future.One concern is that branch-1 would be left to stagnate, at which pointthere would be no option but for users to move to branch-2 as branch-1would be effectively end-of-lifed. I'm not sure how long this wouldtake, but it would eventually happen as a direct result of the veryreason for creating branch-2.A related concern is how disruptive the code changes will be inbranch-2. I imagine that changes in early in branch-2 will be easy tobackport to branch-1, while this effort will become more difficult, ifnot impractical, as time goes. If the code bases diverge too much thenthis could lead to more pressure for users of branch-1 to add featuresjust to branch-1, which has been mentioned as undesirable. By the sametoken, backporting any code in branch-2 will require an increasingamount of effort, which contributors to branch-2 may not be interestedin committing to.These questions affect us directly because, while we require a certainamount of stability, we also like to pull in new functionality thatwill be of value to our users. For example, our current 0.13 releaseis probably closer to 0.14 at this point. Given the lifespan of arelease, it is often more palatable to backport features and bugfixesthan to jump to a new version.
The good thing about this proposal is the opportunity to evaluate andclean up alot of the old code.
Thanks,
chris
On Monday, May 18, 2015 11:48 AM, Sergey Shelukhin<[email protected]> wrote:
Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
people are set in their ways or have practical considerations and don’t
care for new shiny stuff.





Sergey Shelukhin <mailto:[email protected]>
May 18, 2015 at 11:47
Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
people are set in their ways or have practical considerations and don’t
care for new shiny stuff.


Sergey Shelukhin <mailto:[email protected]>
May 18, 2015 at 11:46
I think we need some path for deprecating old Hadoop versions, the same
way we deprecate old Java version support or old RDBMS version support.
At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
goes for stuff like MR; supporting it, esp. for perf work, becomes a
burden, and it’s outdated with 2 alternatives, one of which has been
around for 2 releases.
The branches are a graceful way to get rid of the legacy burden.

Alternatively, when sweeping changes are made, we can do what Hbase did
(which is not pretty imho), where 0.94 version had ~30 dot releases
because people cannot upgrade to 0.96 “singularity” release.


I posit that people who run Hadoop 1 and MR at this day and age (and more
so as time passes) are people who either don’t care about perf and new
features, only stability; so, stability-focused branch would be perfect to
support them.



Edward Capriolo <mailto:[email protected]>
May 18, 2015 at 10:04
Up until recently Hive supported numerous versions of Hadoop code basewith
a simple shim layer. I would rather we stick to the shim layer. I think
this was easily the best part about hive was that a single release worked
well regardless of your hadoop version. It was also a key element tohive's
success. I do not want to see us have multiple branches.

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Reply via email to