On Thu, Jan 11, 2018 at 6:34 PM Tsz Wo Sze <szets...@yahoo.com> wrote:
> The question is: how are we going to fix it? > What do you propose? -C > No incompatible changes are allowed between 3.0.0 and 3.0.1. Dot releases > only allow bug fixes. > > We may not like the statement above but it is our compatibility policy. > We should either follow the policy or revise it. > > Some more questions: > > - What if someone is already using 3.0.0 and has changed all the > scripts to 9820? Just let them fail? > - Compared to 2.x, 3.0.0 has many incompatible changes. Are we going > to have other incompatible changes in the future minor and dot releases? > What is the criteria to decide which incompatible changes are allowed? > - I hate that we have prematurely released 3.0.0 and make 3.0.1 > incompatible to 3.0.0. If the "bug" is that serious, why not fixing it in > 4.0.0 and declare 3.x as dead? > - It seems obvious that no one has seriously tested it so that the > problem is not uncovered until now. Are there bugs in our current release > procedure? > > > Thanks > Tsz-Wo > > > > On Thursday, January 11, 2018, 11:36:33 AM GMT+8, Chris Douglas < > cdoug...@apache.org> wrote: > > > Isn't this limited to reverting the 8020 -> 9820 change? -C > > On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <ey...@hortonworks.com> wrote: > > > The fix in HDFS-9427 can potentially bring in new customers because less > > chance for new comer to encountering “port already in use” problem. If > we > > make change according to HDFS-12990, then this incompatible change does > not > > make incompatible change compatible. Other ports are not reverted > > according to HDFS-12990. User will encounter the bad taste in the mouth > > that HDFS-9427 attempt to solve. Please do consider both negative side > > effects of reverting as well as incompatible minor release change. > Thanks > > > > Regards, > > Eric > > > > From: larry mccay <lmc...@apache.org> > > Date: Wednesday, January 10, 2018 at 10:53 AM > > To: Daryn Sharp <da...@oath.com> > > Cc: "Aaron T. Myers" <a...@apache.org>, Eric Yang <ey...@hortonworks.com > >, > > Chris Douglas <cdoug...@apache.org>, Hadoop Common < > > common-dev@hadoop.apache.org> > > Subject: Re: When are incompatible changes acceptable (HDFS-12990) > > > > On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <da...@oath.com<mailto: > > da...@oath.com>> wrote: > > > > I fully agree the port changes should be reverted. Although > > "incompatible", the potential impact to existing 2.x deploys is huge. > I'd > > rather inconvenience 3.0 deploys that compromise <1% customers. An > > incompatible change to revert an incompatible change is called > > compatibility. > > > > +1 > > > > > > > > > > Most importantly, consider that there is no good upgrade path existing > > deploys, esp. large and/or multi-cluster environments. It’s only > feasible > > for first-time deploys or simple single-cluster upgrades willing to take > > downtime. Let's consider a few reasons why: > > > > > > > > 1. RU is completely broken. Running jobs will fail. If MR on hdfs > > bundles the configs, there's no way to transparently coordinate the > switch > > to the new bundle with the port changed. Job submissions will fail. > > > > > > > > 2. Users generally do not add the rpc port number to uris so unless their > > configs are updated they will contact the wrong port. Seamlessly > > coordinating the conf change without massive failures is impossible. > > > > > > > > 3. Even if client confs are updated, they will break in a multi-cluster > > env with NNs using different ports. Users/services will be forced to add > > the port. The cited hive "issue" is not a bug since it's the only way to > > work in a multi-port env. > > > > > > > > 4. Coordinating the port add/change of uris is systems everywhere (you > > know something will be missed), updating of confs, restarting all > services, > > requiring customers to redeploy their workflows in sync with the NN > > upgrade, will cause mass disruption and downtime that will be > unacceptable > > for production environments. > > > > > > > > This is a solution to a non-existent problem. Ports can be bound by > > multiple processes but only 1 can listen. Maybe multiple listeners is an > > issue for compute nodes but not responsibly managed service nodes. Ie. > Who > > runs arbitrary services on the NNs that bind to random ports? Besides, > the > > default port is and was ephemeral so it solved nothing. > > > > > > > > This either standardizes ports to a particular customer's ports or is a > > poorly thought out whim. In either case, the needs of the many outweigh > > the needs of the few/none (3.0 users). The only logical conclusion is > > revert. If a particular site wants to change default ports and deal with > > the massive fallout, they can explicitly change the ports themselves. > > > > > > > > Daryn > > > > On Tue, Jan 9, 2018 at 11:22 PM, Aaron T. Myers <a...@apache.org<mailto: > > a...@apache.org>> wrote: > > On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang <ey...@hortonworks.com<mailto: > > ey...@hortonworks.com>> wrote: > > > > > While I agree the original port change was unnecessary, I don’t think > > > Hadoop NN port change is a bad thing. > > > > > > I worked for a Hadoop distro that NN RPC port was default to port 9000. > > > When we migrate from BigInsights to IOP and now to HDP, we have to move > > > customer Hive metadata to new NN RPC port. It only took one developer > > > (myself) to write the tool for the migration. The incurring workload > is > > > not as bad as most people anticipated because Hadoop depends on > > > configuration file for referencing namenode. Most of the code can work > > > transparently. It helped to harden the downstream testing tools to be > > more > > > robust. > > > > > > > While there are of course ways to deal with this, the question really > > should be whether or not it's a desirable thing to do to our users. > > > > > > > > > > We will never know how many people are actively working on Hadoop > 3.0.0. > > > Perhaps, couple hundred developers or thousands. > > > > > > You're right that we can't know for sure, but I strongly suspect that > this > > is a substantial overestimate. Given how conservative Hadoop operators > tend > > to be, I view it as exceptionally unlikely that many deployments have > been > > created on or upgraded to Hadoop 3.0.0 since it was released less than a > > month ago. > > > > Further, I hope you'll agree that the number of > > users/developers/deployments/applications which are currently on Hadoop > 2.x > > is *vastly* greater than anyone who might have jumped on Hadoop 3.0.0 so > > quickly. When all of those users upgrade to any 3.x version, they will > > encounter this needless incompatible change and be forced to work around > > it. > > > > > > > I think the switch back may have saved few developers work, but there > > > could be more people getting impacted at unexpected minor release > change > > in > > > the future. I recommend keeping current values to avoid rule bending > and > > > future frustrations. > > > > > > > That we allow this incompatible change now does not mean that we are > > categorically allowing more incompatible changes in the future. My point > is > > that we should in all instances evaluate the merit of any incompatible > > change on a case-by-case basis. This is not an exceptional circumstance - > > we've made incompatible changes in the past when appropriate, e.g. > breaking > > some clients to address a security issue. I and others believe that in > this > > case the benefits greatly outweigh the downsides of changing this back to > > what it has always been. > > > > Best, > > Aaron > > > > > > > > > > Regards, > > > Eric > > > > > > On 1/9/18, 11:21 AM, "Chris Douglas" <cdoug...@apache.org<mailto: > > cdoug...@apache.org>> wrote: > > > > > > Particularly since 9820 isn't in the contiguous range of ports in > > > HDFS-9427, is there any value in this change? > > > > > > Let's change it back to prevent the disruption to users, but > > > downstream projects should treat this as a bug in their tests. > Please > > > open JIRAs in affected projects. -C > > > > > > > > > On Tue, Jan 9, 2018 at 5:18 AM, larry mccay <lmc...@apache.org > > <mailto:lmc...@apache.org>> wrote: > > > > On Mon, Jan 8, 2018 at 11:28 PM, Aaron T. Myers <a...@apache.org > > <mailto:a...@apache.org>> > > > wrote: > > > > > > > >> Thanks a lot for the response, Larry. Comments inline. > > > >> > > > >> On Mon, Jan 8, 2018 at 6:44 PM, larry mccay <lmc...@apache.org > > <mailto:lmc...@apache.org>> > > > wrote: > > > >> > > > >>> Question... > > > >>> > > > >>> Can this be addressed in some way during or before upgrade that > > > allows it > > > >>> to only affect new installs? > > > >>> Even a config based workaround prior to upgrade might make this > a > > > change > > > >>> less disruptive. > > > >>> > > > >>> If part of the upgrade process includes a step (maybe even a > > > script) to > > > >>> set the NN RPC port explicitly beforehand then it would allow > > > existing > > > >>> deployments and related clients to remain whole - otherwise it > > > will uptake > > > >>> the new default port. > > > >>> > > > >> > > > >> Perhaps something like this could be done, but I think there are > > > downsides > > > >> to anything like this. For example, I'm sure there are plenty of > > > >> applications written on top of Hadoop that have tests which > > > hard-code the > > > >> port number. Nothing we do in a setup script will help here. If > we > > > don't > > > >> change the default port back to what it was, these tests will > > > likely all > > > >> have to be updated. > > > >> > > > >> > > > > > > > > I may not have made my point clear enough. > > > > What I meant to say is to fix the default port but direct folks to > > > > explicitly set the port they are using in a deployment (the > current > > > > default) so that it doesn't change out from under them - unless > > they > > > are > > > > fine with it changing. > > > > > > > > > > > >> > > > >>> Meta note: we shouldn't be so pedantic about policy that we > can't > > > back > > > >>> out something that is considered a bug or even mistake. > > > >>> > > > >> > > > >> This is my bigger point. Rigidly adhering to the compat > guidelines > > > in this > > > >> instance helps almost no one, while hurting many folks. > > > >> > > > >> We basically made a mistake when we decided to change the default > > > NN port > > > >> with little upside, even between major versions. We discovered > > this > > > very > > > >> quickly, and we have an opportunity to fix it now and in so doing > > > likely > > > >> disrupt very, very few users and downstream applications. If we > > > don't > > > >> change it, we'll be causing difficulty for our users, downstream > > > >> developers, and ourselves, potentially for years. > > > >> > > > > > > > > Agreed. > > > > > > > > > > > >> > > > >> Best, > > > >> Aaron > > > >> > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > <mailto:common-dev-unsubscr...@hadoop.apache.org> > > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > <mailto:common-dev-h...@hadoop.apache.org> > > > > > > > > > > > > > > > > > > >