No, the proposal was to only fix the NN port change - as I understood it. On Thu, Jan 11, 2018 at 2:01 PM, Eric Yang <ey...@hortonworks.com> wrote:
> If I am reading this correctly, Daryn and Larry are in favor of complete > revert instead of namenode only. Please charm in if I am wrong. This is > the reason that I try to explore each perspective to understand the cost of > each options. It appears that we have a fragment of opinions, and only one > choice will serve the need of majority of the community. It would be good > for a PMC to call the vote at reasonable pace to address this issue to > reduce the pain point from either side of oppositions. > > > > Regards, > > Eric > > > > *From: *Chris Douglas <cdoug...@apache.org> > *Date: *Wednesday, January 10, 2018 at 7:36 PM > *To: *Eric Yang <ey...@hortonworks.com> > *Cc: *"Aaron T. Myers" <a...@apache.org>, Daryn Sharp <da...@oath.com>, > Hadoop Common <common-dev@hadoop.apache.org>, larry mccay < > lmc...@apache.org> > > *Subject: *Re: When are incompatible changes acceptable (HDFS-12990) > > > > Isn't this limited to reverting the 8020 -> 9820 change? -C > > > > On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <ey...@hortonworks.com> wrote: > > The fix in HDFS-9427 can potentially bring in new customers because less > chance for new comer to encountering “port already in use” problem. If we > make change according to HDFS-12990, then this incompatible change does not > make incompatible change compatible. Other ports are not reverted > according to HDFS-12990. User will encounter the bad taste in the mouth > that HDFS-9427 attempt to solve. Please do consider both negative side > effects of reverting as well as incompatible minor release change. Thanks > > Regards, > Eric > > From: larry mccay <lmc...@apache.org> > Date: Wednesday, January 10, 2018 at 10:53 AM > To: Daryn Sharp <da...@oath.com> > Cc: "Aaron T. Myers" <a...@apache.org>, Eric Yang <ey...@hortonworks.com>, > Chris Douglas <cdoug...@apache.org>, Hadoop Common < > common-dev@hadoop.apache.org> > Subject: Re: When are incompatible changes acceptable (HDFS-12990) > > On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <da...@oath.com<mailto:daryn@ > oath.com>> wrote: > > I fully agree the port changes should be reverted. Although > "incompatible", the potential impact to existing 2.x deploys is huge. I'd > rather inconvenience 3.0 deploys that compromise <1% customers. An > incompatible change to revert an incompatible change is called > compatibility. > > +1 > > > > > Most importantly, consider that there is no good upgrade path existing > deploys, esp. large and/or multi-cluster environments. It’s only feasible > for first-time deploys or simple single-cluster upgrades willing to take > downtime. Let's consider a few reasons why: > > > > 1. RU is completely broken. Running jobs will fail. If MR on hdfs > bundles the configs, there's no way to transparently coordinate the switch > to the new bundle with the port changed. Job submissions will fail. > > > > 2. Users generally do not add the rpc port number to uris so unless their > configs are updated they will contact the wrong port. Seamlessly > coordinating the conf change without massive failures is impossible. > > > > 3. Even if client confs are updated, they will break in a multi-cluster > env with NNs using different ports. Users/services will be forced to add > the port. The cited hive "issue" is not a bug since it's the only way to > work in a multi-port env. > > > > 4. Coordinating the port add/change of uris is systems everywhere (you > know something will be missed), updating of confs, restarting all services, > requiring customers to redeploy their workflows in sync with the NN > upgrade, will cause mass disruption and downtime that will be unacceptable > for production environments. > > > > This is a solution to a non-existent problem. Ports can be bound by > multiple processes but only 1 can listen. Maybe multiple listeners is an > issue for compute nodes but not responsibly managed service nodes. Ie. Who > runs arbitrary services on the NNs that bind to random ports? Besides, the > default port is and was ephemeral so it solved nothing. > > > > This either standardizes ports to a particular customer's ports or is a > poorly thought out whim. In either case, the needs of the many outweigh > the needs of the few/none (3.0 users). The only logical conclusion is > revert. If a particular site wants to change default ports and deal with > the massive fallout, they can explicitly change the ports themselves. > > > > Daryn > > On Tue, Jan 9, 2018 at 11:22 PM, Aaron T. Myers <a...@apache.org<mailto: > a...@apache.org>> wrote: > On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang <ey...@hortonworks.com<mailto: > ey...@hortonworks.com>> wrote: > > > While I agree the original port change was unnecessary, I don’t think > > Hadoop NN port change is a bad thing. > > > > I worked for a Hadoop distro that NN RPC port was default to port 9000. > > When we migrate from BigInsights to IOP and now to HDP, we have to move > > customer Hive metadata to new NN RPC port. It only took one developer > > (myself) to write the tool for the migration. The incurring workload is > > not as bad as most people anticipated because Hadoop depends on > > configuration file for referencing namenode. Most of the code can work > > transparently. It helped to harden the downstream testing tools to be > more > > robust. > > > > While there are of course ways to deal with this, the question really > should be whether or not it's a desirable thing to do to our users. > > > > > > We will never know how many people are actively working on Hadoop 3.0.0. > > Perhaps, couple hundred developers or thousands. > > > You're right that we can't know for sure, but I strongly suspect that this > is a substantial overestimate. Given how conservative Hadoop operators tend > to be, I view it as exceptionally unlikely that many deployments have been > created on or upgraded to Hadoop 3.0.0 since it was released less than a > month ago. > > Further, I hope you'll agree that the number of > users/developers/deployments/applications which are currently on Hadoop > 2.x > is *vastly* greater than anyone who might have jumped on Hadoop 3.0.0 so > quickly. When all of those users upgrade to any 3.x version, they will > encounter this needless incompatible change and be forced to work around > it. > > > > I think the switch back may have saved few developers work, but there > > could be more people getting impacted at unexpected minor release change > in > > the future. I recommend keeping current values to avoid rule bending and > > future frustrations. > > > > That we allow this incompatible change now does not mean that we are > categorically allowing more incompatible changes in the future. My point is > that we should in all instances evaluate the merit of any incompatible > change on a case-by-case basis. This is not an exceptional circumstance - > we've made incompatible changes in the past when appropriate, e.g. breaking > some clients to address a security issue. I and others believe that in this > case the benefits greatly outweigh the downsides of changing this back to > what it has always been. > > Best, > Aaron > > > > > > Regards, > > Eric > > > > On 1/9/18, 11:21 AM, "Chris Douglas" <cdoug...@apache.org<mailto:cd > oug...@apache.org>> wrote: > > > > Particularly since 9820 isn't in the contiguous range of ports in > > HDFS-9427, is there any value in this change? > > > > Let's change it back to prevent the disruption to users, but > > downstream projects should treat this as a bug in their tests. Please > > open JIRAs in affected projects. -C > > > > > > On Tue, Jan 9, 2018 at 5:18 AM, larry mccay <lmc...@apache.org > <mailto:lmc...@apache.org>> wrote: > > > On Mon, Jan 8, 2018 at 11:28 PM, Aaron T. Myers <a...@apache.org > <mailto:a...@apache.org>> > > wrote: > > > > > >> Thanks a lot for the response, Larry. Comments inline. > > >> > > >> On Mon, Jan 8, 2018 at 6:44 PM, larry mccay <lmc...@apache.org > <mailto:lmc...@apache.org>> > > wrote: > > >> > > >>> Question... > > >>> > > >>> Can this be addressed in some way during or before upgrade that > > allows it > > >>> to only affect new installs? > > >>> Even a config based workaround prior to upgrade might make this a > > change > > >>> less disruptive. > > >>> > > >>> If part of the upgrade process includes a step (maybe even a > > script) to > > >>> set the NN RPC port explicitly beforehand then it would allow > > existing > > >>> deployments and related clients to remain whole - otherwise it > > will uptake > > >>> the new default port. > > >>> > > >> > > >> Perhaps something like this could be done, but I think there are > > downsides > > >> to anything like this. For example, I'm sure there are plenty of > > >> applications written on top of Hadoop that have tests which > > hard-code the > > >> port number. Nothing we do in a setup script will help here. If we > > don't > > >> change the default port back to what it was, these tests will > > likely all > > >> have to be updated. > > >> > > >> > > > > > > I may not have made my point clear enough. > > > What I meant to say is to fix the default port but direct folks to > > > explicitly set the port they are using in a deployment (the current > > > default) so that it doesn't change out from under them - unless > they > > are > > > fine with it changing. > > > > > > > > >> > > >>> Meta note: we shouldn't be so pedantic about policy that we can't > > back > > >>> out something that is considered a bug or even mistake. > > >>> > > >> > > >> This is my bigger point. Rigidly adhering to the compat guidelines > > in this > > >> instance helps almost no one, while hurting many folks. > > >> > > >> We basically made a mistake when we decided to change the default > > NN port > > >> with little upside, even between major versions. We discovered > this > > very > > >> quickly, and we have an opportunity to fix it now and in so doing > > likely > > >> disrupt very, very few users and downstream applications. If we > > don't > > >> change it, we'll be causing difficulty for our users, downstream > > >> developers, and ourselves, potentially for years. > > >> > > > > > > Agreed. > > > > > > > > >> > > >> Best, > > >> Aaron > > >> > > > > ----------------------------------------------------------- > ---------- > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > <mailto:common-dev-unsubscr...@hadoop.apache.org> > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > <mailto:common-dev-h...@hadoop.apache.org> > > > > > > > > > >