[
https://issues.apache.org/jira/browse/AMBARI-22644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330933#comment-16330933
]
Hudson commented on AMBARI-22644:
---------------------------------
FAILURE: Integrated in Jenkins build Ambari-trunk-Commit #8613 (See
[https://builds.apache.org/job/Ambari-trunk-Commit/8613/])
AMBARI-22644 - Node Managers fail to start after Spark2 is patched due (rlevas:
[https://gitbox.apache.org/repos/asf?p=ambari.git&a=commit&h=7749e655e74c7bb4e3ada6b92943730c5e1b6e76])
* (edit)
ambari-server/src/main/resources/stacks/HDP/2.6/upgrades/config-upgrade.xml
* (edit)
ambari-server/src/main/resources/stacks/HDP/2.5/services/YARN/configuration/yarn-site.xml
* (edit)
ambari-server/src/main/resources/stacks/HDP/3.0/services/YARN/configuration/yarn-site.xml
* (edit)
ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params_linux.py
* (edit)
ambari-server/src/main/resources/common-services/YARN/3.0.0.3.0/package/scripts/params_linux.py
> Node Managers fail to start after Spark2 is patched due to CNF
> YarnShuffleService
> ---------------------------------------------------------------------------------
>
> Key: AMBARI-22644
> URL: https://issues.apache.org/jira/browse/AMBARI-22644
> Project: Ambari
> Issue Type: Bug
> Affects Versions: 2.6.1
> Reporter: Vivek Sharma
> Assignee: Jonathan Hurley
> Priority: Critical
> Fix For: 2.6.2
>
>
> *STR*
> # Deploy HDP-2.6.4.0 cluster with Ambari-2.6.1.0-114
> # Apply HBase patch Upgrade on the cluster (this step is optional)
> # Then apply Spark2 patch Upgrade on the cluster
> # Restart Node Managers
> *Result*
> NM restart fails with below error:
> {code}
> 2017-12-10 07:17:02,559 INFO impl.MetricsSystemImpl
> (MetricsSystemImpl.java:shutdown(606)) - NodeManager metrics system shutdown
> complete.
> 2017-12-10 07:17:02,559 FATAL nodemanager.NodeManager
> (NodeManager.java:initAndStartNodeManager(549)) - Error starting NodeManager
> org.apache.hadoop.service.ServiceStateException:
> java.lang.ClassNotFoundException:
> org.apache.spark.network.yarn.YarnShuffleService
> at
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:245)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:291)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:546)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:594)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.spark.network.yarn.YarnShuffleService
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:197)
> at
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:165)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:131)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 8 more
> 2017-12-10 07:17:02,562 INFO nodemanager.NodeManager
> (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
> {code}
> The spark properties are correctly being written out as per AMBARI-22525.
> Initially, we had defined Spark properties for ATS like this:
> {code}
> <name>yarn.nodemanager.aux-services.spark_shuffle.classpath</name>
> <value>{{stack_root}}/${hdp.version}/spark/aux/*</value>
> {code}
> When YARN upgrades without Spark, we run into AMBARI-22525. Seems like the
> shuffle classes are installed as part of RPM dependencies, but not the
> SparkATSPlugin.
> So:
> - If we use YARN's version for the Spark classes, then ATS can't find
> SparkATSPlugin since that is not part of YARN.
> - If we use Spark's version for the classes, then Spark can never upgrade
> without YARN since NodeManager can't find the new Spark classes.
> However, it seems like shuffle and ATS use different properties. We changed
> all 3 properties in AMBARI-22525:
> {code}
> yarn.nodemanager.aux-services.spark2_shuffle.classpath
> yarn.nodemanager.aux-services.spark_shuffle.classpath
> yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath
> {code}
> It seems like what need to do is change the spark shuffle stuff back to
> hdp.version, but leave ATS using the new version since we're guaranteed to
> have Spark installed on the ATS machine.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)