[ 
https://issues.apache.org/jira/browse/AMBARI-20447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Lysnichenko updated AMBARI-20447:
----------------------------------------
    Description: 
The problem with YARN service check failure is that during Rolling upgrade from 
HDP-2.4 to HDP-2.6 (with YARN HA turned on):
# After "core master restart" step, yarn client uses new (HDP-2.6) config and 
fails with Class 
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found . 
Forcing yarn client to use old (HDP-2.4) config until client binary is updated 
helps here
# After "core slave restart" step, using old YARN client config with old YARN 
client binary does not help. NM/RM classpath points to HDP-2.6. App job gets 
scheduled, but then fails with log:
{code}17/03/06 16:39:27 INFO service.AbstractService: Service 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl failed in state STARTED; 
cause: java.lang.RuntimeException: java.lang.RuntimeException: 
java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
java.lang.RuntimeException: java.lang.RuntimeException: 
java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
at 
org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
at 
org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.serviceStart(AMRMClientImpl.java:186)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.serviceStart(AMRMClientAsyncImpl.java:96)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:559)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:299)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2208)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2232)
... 9 more
Caused by: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2114)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2206)
... 10 more
17/03/06 16:39:27 INFO service.AbstractService: Service 
org.apache.hadoop.yarn.client.api.async.AMRMClientAsync failed in state 
STARTED; cause: java.lang.RuntimeException: java.lang.RuntimeException: 
java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
java.lang.RuntimeException: java.lang.RuntimeException: 
java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
at 
org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
at 
org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
at
{code}
# After yarn client is updated to a new binary, service check works fine.
----

Bottom line, this is a known problem with DistributedShell - it was never fixed 
to not rely on cluster's configuration. What this means is that client 
configuration changes like this can break DistributedShell apps over upgrades.
Unfortunately nothing we do now can fix this broken upgrade for 
DistributedShell - as to ideally fix it, we have to go back in time and provide 
changes.

We have to do two things
# Disable DistributedShell based service-check when we go from 2.4 > 2.6. The 
RequestHedgingRMFailoverProxyProvider is added in 2.5, so 2.5 > 2.6 is fine.
# Also fix yarn-site.xml starting 2.6 with the following change to avoid this 
in the future. The change is from using $HADOOP_CONF_DIR which is inherited 
from the NodeManager to /etc/hadoop/conf/ which is always tied to the client 
version.
{code}
<property>
<name>yarn.application.classpath</name>
<value>/etc/hadoop/conf/,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*</value>
</property>
{code}




  was:


The problem with YARN service check failure is that during Rolling upgrade from 
HDP-2.4 to HDP-2.6 (with YARN HA turned on):
# After "core master restart" step, yarn client uses new (HDP-2.6) config and 
fails with Class 
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found . 
Forcing yarn client to use old (HDP-2.4) config until client binary is updated 
helps here
# After "core slave restart" step, using old YARN client config with old YARN 
client binary does not help. NM/RM classpath points to HDP-2.6. App job gets 
scheduled, but then fails with log:

{code}17/03/06 16:39:27 INFO service.AbstractService: Service 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl failed in state STARTED; 
cause: java.lang.RuntimeException: java.lang.RuntimeException: 
java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
java.lang.RuntimeException: java.lang.RuntimeException: 
java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
at 
org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
at 
org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.serviceStart(AMRMClientImpl.java:186)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.serviceStart(AMRMClientAsyncImpl.java:96)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:559)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:299)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2208)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2232)
... 9 more
Caused by: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2114)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2206)
... 10 more
17/03/06 16:39:27 INFO service.AbstractService: Service 
org.apache.hadoop.yarn.client.api.async.AMRMClientAsync failed in state 
STARTED; cause: java.lang.RuntimeException: java.lang.RuntimeException: 
java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
java.lang.RuntimeException: java.lang.RuntimeException: 
java.lang.ClassNotFoundException: Class 
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
at 
org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
at 
org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
at
{code}
# After yarn client is updated to a new binary, service check works fine.
----

Bottom line, this is a known problem with DistributedShell - it was never fixed 
to not rely on cluster's configuration. What this means is that client 
configuration changes like this can break DistributedShell apps over upgrades.
Unfortunately nothing we do now can fix this broken upgrade for 
DistributedShell - as to ideally fix it, we have to go back in time and provide 
changes.

We have to do two things
# Disable DistributedShell based service-check when we go from 2.4 > 2.6. The 
RequestHedgingRMFailoverProxyProvider is added in 2.5, so 2.5 > 2.6 is fine.
# Also fix yarn-site.xml starting 2.6 with the following change to avoid this 
in the future. The change is from using $HADOOP_CONF_DIR which is inherited 
from the NodeManager to /etc/hadoop/conf/ which is always tied to the client 
version.
{code}
<property>
<name>yarn.application.classpath</name>
<value>/etc/hadoop/conf/,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*</value>
</property>
{code}





> YARN service check failed during HDP 2.4-2.6 rolling upgrade with YARN HA 
> enabled
> ---------------------------------------------------------------------------------
>
>                 Key: AMBARI-20447
>                 URL: https://issues.apache.org/jira/browse/AMBARI-20447
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>            Reporter: Dmitry Lysnichenko
>            Assignee: Dmitry Lysnichenko
>            Priority: Blocker
>             Fix For: 2.5.0
>
>         Attachments: AMBARI-20447.patch
>
>
> The problem with YARN service check failure is that during Rolling upgrade 
> from HDP-2.4 to HDP-2.6 (with YARN HA turned on):
> # After "core master restart" step, yarn client uses new (HDP-2.6) config and 
> fails with Class 
> org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found 
> . Forcing yarn client to use old (HDP-2.4) config until client binary is 
> updated helps here
> # After "core slave restart" step, using old YARN client config with old YARN 
> client binary does not help. NM/RM classpath points to HDP-2.6. App job gets 
> scheduled, but then fails with log:
> {code}17/03/06 16:39:27 INFO service.AbstractService: Service 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl failed in state 
> STARTED; cause: java.lang.RuntimeException: java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> java.lang.RuntimeException: java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
> at 
> org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
> at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
> at 
> org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
> at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.serviceStart(AMRMClientImpl.java:186)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.serviceStart(AMRMClientAsyncImpl.java:96)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:559)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:299)
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not 
> found
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2208)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2232)
> ... 9 more
> Caused by: java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2114)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2206)
> ... 10 more
> 17/03/06 16:39:27 INFO service.AbstractService: Service 
> org.apache.hadoop.yarn.client.api.async.AMRMClientAsync failed in state 
> STARTED; cause: java.lang.RuntimeException: java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> java.lang.RuntimeException: java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
> at 
> org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
> at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
> at 
> org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
> at
> {code}
> # After yarn client is updated to a new binary, service check works fine.
> ----
> Bottom line, this is a known problem with DistributedShell - it was never 
> fixed to not rely on cluster's configuration. What this means is that client 
> configuration changes like this can break DistributedShell apps over upgrades.
> Unfortunately nothing we do now can fix this broken upgrade for 
> DistributedShell - as to ideally fix it, we have to go back in time and 
> provide changes.
> We have to do two things
> # Disable DistributedShell based service-check when we go from 2.4 > 2.6. The 
> RequestHedgingRMFailoverProxyProvider is added in 2.5, so 2.5 > 2.6 is fine.
> # Also fix yarn-site.xml starting 2.6 with the following change to avoid this 
> in the future. The change is from using $HADOOP_CONF_DIR which is inherited 
> from the NodeManager to /etc/hadoop/conf/ which is always tied to the client 
> version.
> {code}
> <property>
> <name>yarn.application.classpath</name>
> <value>/etc/hadoop/conf/,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*</value>
> </property>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to