Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Sergey Shelukhin Mon, 18 May 2015 11:49:20 -0700

Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some
people are set in their ways or have practical considerations and don’t
care for new shiny stuff.


On 15/5/18, 11:46, "Sergey Shelukhin" <ser...@hortonworks.com> wrote:

>I think we need some path for deprecating old Hadoop versions, the same
>way we deprecate old Java version support or old RDBMS version support.
>At some point the cost of supporting Hadoop 1 exceeds the benefit. Same
>goes for stuff like MR; supporting it, esp. for perf work, becomes a
>burden, and it’s outdated with 2 alternatives, one of which has been
>around for 2 releases.
>The branches are a graceful way to get rid of the legacy burden.
>
>Alternatively, when sweeping changes are made, we can do what Hbase did
>(which is not pretty imho), where 0.94 version had ~30 dot releases
>because people cannot upgrade to 0.96 “singularity” release.
>
>
>I posit that people who run Hadoop 1 and MR at this day and age (and more
>so as time passes) are people who either don’t care about perf and new
>features, only stability; so, stability-focused branch would be perfect to
>support them.
>
>
>On 15/5/18, 10:04, "Edward Capriolo" <edlinuxg...@gmail.com> wrote:
>
>>Up until recently Hive supported numerous versions of Hadoop code base
>>with
>>a simple shim layer. I would rather we stick to the shim layer. I think
>>this was easily the best part about hive was that a single release worked
>>well regardless of your hadoop version. It was also a key element to
>>hive's
>>success. I do not want to see us have multiple branches.
>>
>>On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang <xzh...@cloudera.com> wrote:
>>
>>> Thanks for the explanation, Alan!
>>>
>>> While I have understood more on the proposal, I actually see more
>>>problems
>>> than the confusion of two lines of releases. Essentially, this proposal
>>> forces a user to make a hard choice between a stabler, legacy-aware
>>>release
>>> line and an adventurous, pioneering release line. And once the choice
>>>is
>>> made, there is no easy way back or forward.
>>>
>>> Here is my interpretation. Let's say we have two main branches as
>>> proposed. I develop a new feature which I think useful for both
>>>branches.
>>> So, I commit it to both branches. My feature requires additional schema
>>> support, so I provide upgrade scripts for both branches. The scripts
>>>are
>>> different because the two branches have already diverged in schema.
>>>
>>> Now the two branches evolve in a diverging fashion like this. This is
>>>all
>>> good as long as a user stays in his line. The moment the user considers
>>>a
>>> switch, mostly likely, from branch-1 to branch-2, he is stuck. Why?
>>>Because
>>> there is no upgrade path from a release in branch-1 to a release in
>>> branch-2!
>>>
>>> If we want to provide an upgrade path, then there will be MxN paths,
>>>where
>>> M and N are the number of releases in the two branches, respectively.
>>>This
>>> is going to be next to a nightmare, not only for users, but also for
>>>us.
>>>
>>> Also, the proposal will require two sets of things that Hive provides:
>>> double documentation, double feature tracking, double build/test
>>> infrastructures, etc.
>>>
>>> This approach can also potentially cause the problem we saw in hadoop
>>> releases, where 0.23 release was greater than 1.0 release.
>>>
>>> To me, the problem we are trying to solve is deprecating old things
>>>such
>>> hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see,
>>> however, we approached the problem in less favorable ways.
>>>
>>> First, it seemed we wanted to deprecate something just for the sake of
>>> deprecation, and it's not based on the rationale that supports the
>>>desire.
>>> Dev might write code that accidentally break hadoop-1 build. However,
>>>this
>>> is more a build infrastructure problem rather than the burden of
>>>supporting
>>> hadoop-1. If our build could catch it at precommit test, then I would
>>>think
>>> the accident can be well avoided. Most of the times, fixing the build
>>>is
>>> trivial. And we have already addressed the build infrastructure
>>>problem.
>>>
>>> Secondly, if we do have a strong reason to deprecate something, we
>>>should
>>> have a deprecation plan rather than declaring on the spot that the
>>>current
>>> release is the last one supporting X. I think Microsoft did a better
>>>job in
>>> terms production deprecation. For instance, they announced long before
>>>the
>>> last day desupporting Windows XP. In my opinion, we should have a
>>>similar
>>> vision, giving users, distributions enough time to adjust rather than
>>> shocking them with breaking news.
>>>
>>> In summary, I do see the need of deprecation in Hive, but I am afraid
>>>the
>>> way we take, including the proposal here, isn't going to nicely solve
>>>the
>>> problem. On the contrary, I foresee a spectrum of confusion,
>>>frustration,
>>> and burden for the user as well as for developers.
>>>
>>> Thanks,
>>> Xuefu
>>>
>>> On Fri, May 15, 2015 at 8:19 PM, Alan Gates <alanfga...@gmail.com>
>>>wrote:
>>>
>>>>
>>>>
>>>>   Xuefu Zhang <xzh...@cloudera.com>
>>>>  May 15, 2015 at 17:31
>>>>
>>>> Just make sure that I understand the proposal correctly: we are going
>>>>to
>>>> have two main branches, one for hadoop-1 and one for hadoop-2.
>>>>
>>>>  We shouldn't tie this to hadoop-1 and 2.  It's about Hive not Hadoop.
>>>> It will be some time before Hive's branch-2 is stable, while Hadoop-2
>>>>is
>>>> already well established.
>>>>
>>>>  New features
>>>> are only merged to branch-2. That essentially says we stop development
>>>>for
>>>> hadoop-1, right?
>>>>
>>>>  If developers want to keep contributing patches to branch-1 then
>>>> there's no need for it to stop.  We would want to avoid putting new
>>>> features only on branch-1, unless they only made sense in that
>>>>context.
>>>> But I assume we'll see people contributing to branch-1 for some time.
>>>>
>>>>  Are we also making two lines of releases: ene for branch-1
>>>> and one for branch-2? Won't that be confusing and also burdensome if
>>>>we
>>>> release say 1.3, 2.0, 2.1, 1.4...
>>>>
>>>>  I'm asserting that it will be less confusing than the alternatives.
>>>>We
>>>> need some way to make early releases of many of the new features.  I
>>>> believe that this proposal is less confusing than if we start putting
>>>>the
>>>> new features in 1.x branches.  This is particularly true because it
>>>>would
>>>> help us to start being able to drop older functionality like Hadoop-1
>>>>and
>>>> MapReduce, which is very hard to do in the 1.x line without stranding
>>>>users.
>>>>
>>>>  Please note that we will have hadoop 3 soon. What's the story there?
>>>>
>>>>  As I said above, I don't see this as tied to Hadoop versions.
>>>>
>>>> Alan.
>>>>
>>>>  Thanks,
>>>> Xuefu
>>>>
>>>>
>>>>
>>>> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta
>>>><vgumas...@hortonworks.com
>>>>
>>>> wrote:
>>>>
>>>>  +1 on the new branch. I think it’ll help in faster dev time for these
>>>> important changes.
>>>>
>>>>  —Vaibhav
>>>>
>>>>   From: Alan Gates <alanfga...@gmail.com> <alanfga...@gmail.com>
>>>> Reply-To: "dev@hive.apache.org" <dev@hive.apache.org>
>>>><dev@hive.apache.org> <dev@hive.apache.org>
>>>> Date: Friday, May 15, 2015 at 4:11 PM
>>>> To: "dev@hive.apache.org" <dev@hive.apache.org> <dev@hive.apache.org>
>>>><dev@hive.apache.org>
>>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>>>
>>>>  Anyone else have feedback on this?  If not I'll start a vote next
>>>>week.
>>>>
>>>> Alan.
>>>>
>>>>    Gopal Vijayaraghavan <gop...@apache.org> <gop...@apache.org>
>>>> May 14, 2015 at 10:44
>>>>   Hi,
>>>>
>>>> +1 on the idea.
>>>>
>>>> Having a stable release branch with ongoing fixes where we do not drop
>>>> major features would be good all around.
>>>>
>>>> It lets us accelerate the pace of development, drop major features or
>>>> rewrite them entirely without dragging everyone else kicking &
>>>>screaming
>>>> into that release.
>>>>
>>>> Cheers,
>>>> Gopal
>>>>
>>>>
>>>>
>>>>    Sergey Shelukhin <ser...@hortonworks.com> <ser...@hortonworks.com>
>>>> May 11, 2015 at 19:17
>>>>   That sounds like a good idea.
>>>> Some features could be back ported to branch-1 if viable, but at least
>>>>new
>>>> stuff would not be burdened by Hadoop 1/MR code paths.
>>>> Probably also a good place to enable vectorization and other perf
>>>>features
>>>> by default while we make alpha releases.
>>>>
>>>> +1
>>>>
>>>>
>>>>    Alan Gates <alanfga...@gmail.com> <alanfga...@gmail.com>
>>>> May 11, 2015 at 15:38
>>>>   There is a lot of forward-looking work going on in various branches
>>>>of
>>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
>>>>would
>>>> be good to have a way to release this code to users so that they can
>>>> experiment with it.  Releasing it will also provide feedback to
>>>>developers.
>>>>
>>>> At the same time there are discussions on whether to keep supporting
>>>> Hadoop-1.  The burden of supporting older, less used functionality
>>>>such as
>>>> Hadoop-1 is becoming ever harder as many new features are added.
>>>>
>>>> I propose that the best way to deal with this would be to make a
>>>> branch-1.  We could continue to make new feature releases off of this
>>>> branch (1.3, 1.4, etc.).  This branch would not drop old
>>>>functionality.
>>>> This provides stability and continuity for users and developers.
>>>>
>>>> We could then merge these new features branches (LLAP, HBase
>>>>metastore,
>>>> CLI drop) into the trunk, as well as turn on by default newer features
>>>>such
>>>> as the vectorization and ACID.  We could also drop older, less used
>>>> features such as support for Hadoop-1 and MapReduce.  It will be a
>>>>while
>>>> before we are ready to make stable, production ready releases of this
>>>> code.  But we could start making alpha quality releases soon.  We
>>>>would
>>>> call these releases 2.x, to stress the non-backward compatible changes
>>>>such
>>>> as dropping Hadoop-1.  This will give users a chance to play with the
>>>>new
>>>> code and developers a chance to get feedback.
>>>>
>>>> Thoughts?
>>>>
>>>>
>>>>
>>>>   Vaibhav Gumashta <vgumas...@hortonworks.com>
>>>>  May 15, 2015 at 16:43
>>>>  +1 on the new branch. I think it’ll help in faster dev time for these
>>>> important changes.
>>>>
>>>>  —Vaibhav
>>>>
>>>>   From: Alan Gates <alanfga...@gmail.com>
>>>> Reply-To: "dev@hive.apache.org" <dev@hive.apache.org>
>>>> Date: Friday, May 15, 2015 at 4:11 PM
>>>> To: "dev@hive.apache.org" <dev@hive.apache.org>
>>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features
>>>>
>>>>  Anyone else have feedback on this?  If not I'll start a vote next
>>>>week.
>>>>
>>>> Alan.
>>>>
>>>>     Gopal Vijayaraghavan <gop...@apache.org>
>>>>  May 14, 2015 at 10:44
>>>> Hi,
>>>>
>>>> +1 on the idea.
>>>>
>>>> Having a stable release branch with ongoing fixes where we do not drop
>>>> major features would be good all around.
>>>>
>>>> It lets us accelerate the pace of development, drop major features or
>>>> rewrite them entirely without dragging everyone else kicking &
>>>>screaming
>>>> into that release.
>>>>
>>>> Cheers,
>>>> Gopal
>>>>
>>>>
>>>>
>>>>   Sergey Shelukhin <ser...@hortonworks.com>
>>>>  May 11, 2015 at 19:17
>>>> That sounds like a good idea.
>>>> Some features could be back ported to branch-1 if viable, but at least
>>>>new
>>>> stuff would not be burdened by Hadoop 1/MR code paths.
>>>> Probably also a good place to enable vectorization and other perf
>>>>features
>>>> by default while we make alpha releases.
>>>>
>>>> +1
>>>>
>>>>
>>>>   Alan Gates <alanfga...@gmail.com>
>>>>  May 11, 2015 at 15:38
>>>> There is a lot of forward-looking work going on in various branches of
>>>> Hive:  LLAP, the HBase metastore, and the work to drop the CLI.  It
>>>>would
>>>> be good to have a way to release this code to users so that they can
>>>> experiment with it.  Releasing it will also provide feedback to
>>>>developers.
>>>>
>>>> At the same time there are discussions on whether to keep supporting
>>>> Hadoop-1.  The burden of supporting older, less used functionality
>>>>such as
>>>> Hadoop-1 is becoming ever harder as many new features are added.
>>>>
>>>> I propose that the best way to deal with this would be to make a
>>>> branch-1.  We could continue to make new feature releases off of this
>>>> branch (1.3, 1.4, etc.).  This branch would not drop old
>>>>functionality.
>>>> This provides stability and continuity for users and developers.
>>>>
>>>> We could then merge these new features branches (LLAP, HBase
>>>>metastore,
>>>> CLI drop) into the trunk, as well as turn on by default newer features
>>>>such
>>>> as the vectorization and ACID.  We could also drop older, less used
>>>> features such as support for Hadoop-1 and MapReduce.  It will be a
>>>>while
>>>> before we are ready to make stable, production ready releases of this
>>>> code.  But we could start making alpha quality releases soon.  We
>>>>would
>>>> call these releases 2.x, to stress the non-backward compatible changes
>>>>such
>>>> as dropping Hadoop-1.  This will give users a chance to play with the
>>>>new
>>>> code and developers a chance to get feedback.
>>>>
>>>> Thoughts?
>>>>
>>>>
>>>
>

Re: [DISCUSS] Supporting Hadoop-1 and experimental features

Reply via email to