Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some people are set in their ways or have practical considerations and don’t care for new shiny stuff.
On 15/5/18, 11:46, "Sergey Shelukhin" <ser...@hortonworks.com> wrote: >I think we need some path for deprecating old Hadoop versions, the same >way we deprecate old Java version support or old RDBMS version support. >At some point the cost of supporting Hadoop 1 exceeds the benefit. Same >goes for stuff like MR; supporting it, esp. for perf work, becomes a >burden, and it’s outdated with 2 alternatives, one of which has been >around for 2 releases. >The branches are a graceful way to get rid of the legacy burden. > >Alternatively, when sweeping changes are made, we can do what Hbase did >(which is not pretty imho), where 0.94 version had ~30 dot releases >because people cannot upgrade to 0.96 “singularity” release. > > >I posit that people who run Hadoop 1 and MR at this day and age (and more >so as time passes) are people who either don’t care about perf and new >features, only stability; so, stability-focused branch would be perfect to >support them. > > >On 15/5/18, 10:04, "Edward Capriolo" <edlinuxg...@gmail.com> wrote: > >>Up until recently Hive supported numerous versions of Hadoop code base >>with >>a simple shim layer. I would rather we stick to the shim layer. I think >>this was easily the best part about hive was that a single release worked >>well regardless of your hadoop version. It was also a key element to >>hive's >>success. I do not want to see us have multiple branches. >> >>On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang <xzh...@cloudera.com> wrote: >> >>> Thanks for the explanation, Alan! >>> >>> While I have understood more on the proposal, I actually see more >>>problems >>> than the confusion of two lines of releases. Essentially, this proposal >>> forces a user to make a hard choice between a stabler, legacy-aware >>>release >>> line and an adventurous, pioneering release line. And once the choice >>>is >>> made, there is no easy way back or forward. >>> >>> Here is my interpretation. Let's say we have two main branches as >>> proposed. I develop a new feature which I think useful for both >>>branches. >>> So, I commit it to both branches. My feature requires additional schema >>> support, so I provide upgrade scripts for both branches. The scripts >>>are >>> different because the two branches have already diverged in schema. >>> >>> Now the two branches evolve in a diverging fashion like this. This is >>>all >>> good as long as a user stays in his line. The moment the user considers >>>a >>> switch, mostly likely, from branch-1 to branch-2, he is stuck. Why? >>>Because >>> there is no upgrade path from a release in branch-1 to a release in >>> branch-2! >>> >>> If we want to provide an upgrade path, then there will be MxN paths, >>>where >>> M and N are the number of releases in the two branches, respectively. >>>This >>> is going to be next to a nightmare, not only for users, but also for >>>us. >>> >>> Also, the proposal will require two sets of things that Hive provides: >>> double documentation, double feature tracking, double build/test >>> infrastructures, etc. >>> >>> This approach can also potentially cause the problem we saw in hadoop >>> releases, where 0.23 release was greater than 1.0 release. >>> >>> To me, the problem we are trying to solve is deprecating old things >>>such >>> hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see, >>> however, we approached the problem in less favorable ways. >>> >>> First, it seemed we wanted to deprecate something just for the sake of >>> deprecation, and it's not based on the rationale that supports the >>>desire. >>> Dev might write code that accidentally break hadoop-1 build. However, >>>this >>> is more a build infrastructure problem rather than the burden of >>>supporting >>> hadoop-1. If our build could catch it at precommit test, then I would >>>think >>> the accident can be well avoided. Most of the times, fixing the build >>>is >>> trivial. And we have already addressed the build infrastructure >>>problem. >>> >>> Secondly, if we do have a strong reason to deprecate something, we >>>should >>> have a deprecation plan rather than declaring on the spot that the >>>current >>> release is the last one supporting X. I think Microsoft did a better >>>job in >>> terms production deprecation. For instance, they announced long before >>>the >>> last day desupporting Windows XP. In my opinion, we should have a >>>similar >>> vision, giving users, distributions enough time to adjust rather than >>> shocking them with breaking news. >>> >>> In summary, I do see the need of deprecation in Hive, but I am afraid >>>the >>> way we take, including the proposal here, isn't going to nicely solve >>>the >>> problem. On the contrary, I foresee a spectrum of confusion, >>>frustration, >>> and burden for the user as well as for developers. >>> >>> Thanks, >>> Xuefu >>> >>> On Fri, May 15, 2015 at 8:19 PM, Alan Gates <alanfga...@gmail.com> >>>wrote: >>> >>>> >>>> >>>> Xuefu Zhang <xzh...@cloudera.com> >>>> May 15, 2015 at 17:31 >>>> >>>> Just make sure that I understand the proposal correctly: we are going >>>>to >>>> have two main branches, one for hadoop-1 and one for hadoop-2. >>>> >>>> We shouldn't tie this to hadoop-1 and 2. It's about Hive not Hadoop. >>>> It will be some time before Hive's branch-2 is stable, while Hadoop-2 >>>>is >>>> already well established. >>>> >>>> New features >>>> are only merged to branch-2. That essentially says we stop development >>>>for >>>> hadoop-1, right? >>>> >>>> If developers want to keep contributing patches to branch-1 then >>>> there's no need for it to stop. We would want to avoid putting new >>>> features only on branch-1, unless they only made sense in that >>>>context. >>>> But I assume we'll see people contributing to branch-1 for some time. >>>> >>>> Are we also making two lines of releases: ene for branch-1 >>>> and one for branch-2? Won't that be confusing and also burdensome if >>>>we >>>> release say 1.3, 2.0, 2.1, 1.4... >>>> >>>> I'm asserting that it will be less confusing than the alternatives. >>>>We >>>> need some way to make early releases of many of the new features. I >>>> believe that this proposal is less confusing than if we start putting >>>>the >>>> new features in 1.x branches. This is particularly true because it >>>>would >>>> help us to start being able to drop older functionality like Hadoop-1 >>>>and >>>> MapReduce, which is very hard to do in the 1.x line without stranding >>>>users. >>>> >>>> Please note that we will have hadoop 3 soon. What's the story there? >>>> >>>> As I said above, I don't see this as tied to Hadoop versions. >>>> >>>> Alan. >>>> >>>> Thanks, >>>> Xuefu >>>> >>>> >>>> >>>> On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashta >>>><vgumas...@hortonworks.com >>>> >>>> wrote: >>>> >>>> +1 on the new branch. I think it’ll help in faster dev time for these >>>> important changes. >>>> >>>> —Vaibhav >>>> >>>> From: Alan Gates <alanfga...@gmail.com> <alanfga...@gmail.com> >>>> Reply-To: "dev@hive.apache.org" <dev@hive.apache.org> >>>><dev@hive.apache.org> <dev@hive.apache.org> >>>> Date: Friday, May 15, 2015 at 4:11 PM >>>> To: "dev@hive.apache.org" <dev@hive.apache.org> <dev@hive.apache.org> >>>><dev@hive.apache.org> >>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features >>>> >>>> Anyone else have feedback on this? If not I'll start a vote next >>>>week. >>>> >>>> Alan. >>>> >>>> Gopal Vijayaraghavan <gop...@apache.org> <gop...@apache.org> >>>> May 14, 2015 at 10:44 >>>> Hi, >>>> >>>> +1 on the idea. >>>> >>>> Having a stable release branch with ongoing fixes where we do not drop >>>> major features would be good all around. >>>> >>>> It lets us accelerate the pace of development, drop major features or >>>> rewrite them entirely without dragging everyone else kicking & >>>>screaming >>>> into that release. >>>> >>>> Cheers, >>>> Gopal >>>> >>>> >>>> >>>> Sergey Shelukhin <ser...@hortonworks.com> <ser...@hortonworks.com> >>>> May 11, 2015 at 19:17 >>>> That sounds like a good idea. >>>> Some features could be back ported to branch-1 if viable, but at least >>>>new >>>> stuff would not be burdened by Hadoop 1/MR code paths. >>>> Probably also a good place to enable vectorization and other perf >>>>features >>>> by default while we make alpha releases. >>>> >>>> +1 >>>> >>>> >>>> Alan Gates <alanfga...@gmail.com> <alanfga...@gmail.com> >>>> May 11, 2015 at 15:38 >>>> There is a lot of forward-looking work going on in various branches >>>>of >>>> Hive: LLAP, the HBase metastore, and the work to drop the CLI. It >>>>would >>>> be good to have a way to release this code to users so that they can >>>> experiment with it. Releasing it will also provide feedback to >>>>developers. >>>> >>>> At the same time there are discussions on whether to keep supporting >>>> Hadoop-1. The burden of supporting older, less used functionality >>>>such as >>>> Hadoop-1 is becoming ever harder as many new features are added. >>>> >>>> I propose that the best way to deal with this would be to make a >>>> branch-1. We could continue to make new feature releases off of this >>>> branch (1.3, 1.4, etc.). This branch would not drop old >>>>functionality. >>>> This provides stability and continuity for users and developers. >>>> >>>> We could then merge these new features branches (LLAP, HBase >>>>metastore, >>>> CLI drop) into the trunk, as well as turn on by default newer features >>>>such >>>> as the vectorization and ACID. We could also drop older, less used >>>> features such as support for Hadoop-1 and MapReduce. It will be a >>>>while >>>> before we are ready to make stable, production ready releases of this >>>> code. But we could start making alpha quality releases soon. We >>>>would >>>> call these releases 2.x, to stress the non-backward compatible changes >>>>such >>>> as dropping Hadoop-1. This will give users a chance to play with the >>>>new >>>> code and developers a chance to get feedback. >>>> >>>> Thoughts? >>>> >>>> >>>> >>>> Vaibhav Gumashta <vgumas...@hortonworks.com> >>>> May 15, 2015 at 16:43 >>>> +1 on the new branch. I think it’ll help in faster dev time for these >>>> important changes. >>>> >>>> —Vaibhav >>>> >>>> From: Alan Gates <alanfga...@gmail.com> >>>> Reply-To: "dev@hive.apache.org" <dev@hive.apache.org> >>>> Date: Friday, May 15, 2015 at 4:11 PM >>>> To: "dev@hive.apache.org" <dev@hive.apache.org> >>>> Subject: Re: [DISCUSS] Supporting Hadoop-1 and experimental features >>>> >>>> Anyone else have feedback on this? If not I'll start a vote next >>>>week. >>>> >>>> Alan. >>>> >>>> Gopal Vijayaraghavan <gop...@apache.org> >>>> May 14, 2015 at 10:44 >>>> Hi, >>>> >>>> +1 on the idea. >>>> >>>> Having a stable release branch with ongoing fixes where we do not drop >>>> major features would be good all around. >>>> >>>> It lets us accelerate the pace of development, drop major features or >>>> rewrite them entirely without dragging everyone else kicking & >>>>screaming >>>> into that release. >>>> >>>> Cheers, >>>> Gopal >>>> >>>> >>>> >>>> Sergey Shelukhin <ser...@hortonworks.com> >>>> May 11, 2015 at 19:17 >>>> That sounds like a good idea. >>>> Some features could be back ported to branch-1 if viable, but at least >>>>new >>>> stuff would not be burdened by Hadoop 1/MR code paths. >>>> Probably also a good place to enable vectorization and other perf >>>>features >>>> by default while we make alpha releases. >>>> >>>> +1 >>>> >>>> >>>> Alan Gates <alanfga...@gmail.com> >>>> May 11, 2015 at 15:38 >>>> There is a lot of forward-looking work going on in various branches of >>>> Hive: LLAP, the HBase metastore, and the work to drop the CLI. It >>>>would >>>> be good to have a way to release this code to users so that they can >>>> experiment with it. Releasing it will also provide feedback to >>>>developers. >>>> >>>> At the same time there are discussions on whether to keep supporting >>>> Hadoop-1. The burden of supporting older, less used functionality >>>>such as >>>> Hadoop-1 is becoming ever harder as many new features are added. >>>> >>>> I propose that the best way to deal with this would be to make a >>>> branch-1. We could continue to make new feature releases off of this >>>> branch (1.3, 1.4, etc.). This branch would not drop old >>>>functionality. >>>> This provides stability and continuity for users and developers. >>>> >>>> We could then merge these new features branches (LLAP, HBase >>>>metastore, >>>> CLI drop) into the trunk, as well as turn on by default newer features >>>>such >>>> as the vectorization and ACID. We could also drop older, less used >>>> features such as support for Hadoop-1 and MapReduce. It will be a >>>>while >>>> before we are ready to make stable, production ready releases of this >>>> code. But we could start making alpha quality releases soon. We >>>>would >>>> call these releases 2.x, to stress the non-backward compatible changes >>>>such >>>> as dropping Hadoop-1. This will give users a chance to play with the >>>>new >>>> code and developers a chance to get feedback. >>>> >>>> Thoughts? >>>> >>>> >>> >