[
https://issues.apache.org/jira/browse/AMBARI-22628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Hurley updated AMBARI-22628:
-------------------------------------
Description:
Installing a new cluster can create values in yarn-site.xml which have {{None}}
specified in the classpath for Spark
{code:java}
<property>
<name>yarn.nodemanager.aux-services.spark2_shuffle.classpath</name>
<value>/usr/hdp/None/spark2/aux/*</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.classpath</name>
<value>/usr/hdp/None/spark/aux/*</value>
</property>
<property>
<name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath</name>
<value>/usr/hdp/None/spark/hdpLib/*</value>
</property>
{code}
The cause for this is that YARN Clients on hosts without daemons never get a
restart command after the initial {{yarn-site.xml}}, and can never fill in the
correct values. This causes problems when jobs are run on these nodes:
{code}
2017-12-04 10:16:41,789 INFO service.AbstractService
(AbstractService.java:noteFailure(272)) - Service
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed
in state INITED; cause: java.lang.ClassNotFoundException:
org.apache.spark.network.yarn.YarnShuffleService
java.lang.ClassNotFoundException:
org.apache.spark.network.yarn.YarnShuffleService
{code}
was:
Downloaded client configs have invalid values for spark properties in
yarn-site.xml.
Issue: spark_version variable is replaced by 'None' in the spark related config
properties in yarn-site in the client configs downloaded.
Attaching downloaded yarn-site.xml
[^yarn-site.xml]
Properties with issue:
{code:java}
<property>
<name>yarn.nodemanager.aux-services.spark2_shuffle.classpath</name>
<value>/usr/hdp/None/spark2/aux/*</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.classpath</name>
<value>/usr/hdp/None/spark/aux/*</value>
</property>
<property>
<name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath</name>
<value>/usr/hdp/None/spark/hdpLib/*</value>
</property>
{code}
The cause for this is that YARN Clients on hosts without daemons never get a
restart command after the initial {{yarn-site.xml}}, and can never fill in the
correct values.
> YARN Shuffle Service Can't Be Found On Client-Only Nodes After New Cluster
> Install
> ----------------------------------------------------------------------------------
>
> Key: AMBARI-22628
> URL: https://issues.apache.org/jira/browse/AMBARI-22628
> Project: Ambari
> Issue Type: Bug
> Affects Versions: 2.6.1
> Reporter: Kishor Ramakrishnan
> Assignee: Jonathan Hurley
> Priority: Blocker
> Fix For: 2.6.1
>
>
> Installing a new cluster can create values in yarn-site.xml which have
> {{None}} specified in the classpath for Spark
> {code:java}
> <property>
> <name>yarn.nodemanager.aux-services.spark2_shuffle.classpath</name>
> <value>/usr/hdp/None/spark2/aux/*</value>
> </property>
> <property>
> <name>yarn.nodemanager.aux-services.spark_shuffle.classpath</name>
> <value>/usr/hdp/None/spark/aux/*</value>
> </property>
> <property>
>
> <name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath</name>
> <value>/usr/hdp/None/spark/hdpLib/*</value>
> </property>
> {code}
> The cause for this is that YARN Clients on hosts without daemons never get a
> restart command after the initial {{yarn-site.xml}}, and can never fill in
> the correct values. This causes problems when jobs are run on these nodes:
> {code}
> 2017-12-04 10:16:41,789 INFO service.AbstractService
> (AbstractService.java:noteFailure(272)) - Service
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed
> in state INITED; cause: java.lang.ClassNotFoundException:
> org.apache.spark.network.yarn.YarnShuffleService
> java.lang.ClassNotFoundException:
> org.apache.spark.network.yarn.YarnShuffleService
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)