[ https://issues.apache.org/jira/browse/AMBARI-21530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Hurley updated AMBARI-21530: ------------------------------------- Status: Patch Available (was: Open) > Service Checks During Upgrades Should Use Desired Stack > ------------------------------------------------------- > > Key: AMBARI-21530 > URL: https://issues.apache.org/jira/browse/AMBARI-21530 > Project: Ambari > Issue Type: Bug > Components: ambari-server > Affects Versions: 2.5.2 > Reporter: Jonathan Hurley > Assignee: Jonathan Hurley > Priority: Blocker > Fix For: 2.5.2 > > Attachments: AMBARI-21530.patch > > > During an upgrade from BI 4.2 to HDP 2.6, some service checks were failing. > This is because the service checks were having their hooks/service folders > overwritten by some of the scheduler framework. At the time of orchestration, > the cluster desired ID was still BI but the effective ID used for the upgrade > was HDP (which was being clobbered) > Exception on running YARN service check: > {code} > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/YARN/package/scripts/service_check.py", > line 91, in <module> > ServiceCheck().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 329, in execute > method(env) > File > "/var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/YARN/package/scripts/service_check.py", > line 54, in service_check > user=params.smokeuser, > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 72, in inner > result = function(command, **kwargs) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 102, in checked_call > tries=tries, try_sleep=try_sleep, > timeout_kill_strategy=timeout_kill_strategy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 150, in _call_wrapper > result = _call(command, **kwargs_copy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 303, in _call > raise ExecutionFailed(err_msg, code, out, err) > resource_management.core.exceptions.ExecutionFailed: Execution of 'yarn > org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls > -num_containers 1 -jar > /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell*.jar' returned > 1. 17/07/19 19:34:40 INFO distributedshell.Client: Initializing Client > 17/07/19 19:34:40 INFO distributedshell.Client: Running Client > 17/07/19 19:34:40 INFO client.RMProxy: Connecting to ResourceManager at > sid-bigi-2.c.pramod-thangali.internal/10.240.0.47:8050 > 17/07/19 19:34:40 INFO client.AHSProxy: Connecting to Application History > server at sid-bigi-2.c.pramod-thangali.internal/10.240.0.47:10200 > 17/07/19 19:34:40 INFO distributedshell.Client: Got Cluster metric info from > ASM, numNodeManagers=1 > 17/07/19 19:34:40 INFO distributedshell.Client: Got Cluster node info from ASM > 17/07/19 19:34:40 INFO distributedshell.Client: Got node report from ASM for, > nodeId=sid-bigi-3.c.pramod-thangali.internal:45454, > nodeAddresssid-bigi-3.c.pramod-thangali.internal:8042, > nodeRackName/default-rack, nodeNumContainers0 > 17/07/19 19:34:40 INFO distributedshell.Client: Queue info, > queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, > queueApplicationCount=0, queueChildQueueCount=0 > 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, > queueName=root, userAcl=SUBMIT_APPLICATIONS > 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, > queueName=root, userAcl=ADMINISTER_QUEUE > 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, > queueName=default, userAcl=SUBMIT_APPLICATIONS > 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, > queueName=default, userAcl=ADMINISTER_QUEUE > 17/07/19 19:34:40 INFO distributedshell.Client: Max mem capability of > resources in this cluster 10240 > 17/07/19 19:34:40 INFO distributedshell.Client: Max virtual cores capabililty > of resources in this cluster 3 > 17/07/19 19:34:40 INFO distributedshell.Client: Copy App Master jar from > local filesystem and add to local environment > 17/07/19 19:34:41 FATAL distributedshell.Client: Error running Client > java.io.FileNotFoundException: File > /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell*.jar does not > exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:624) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:850) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:614) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340) > at > org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2012) > at > org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1980) > at > org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1945) > at > org.apache.hadoop.yarn.applications.distributedshell.Client.addToLocalResources(Client.java:820) > at > org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:532) > at > org.apache.hadoop.yarn.applications.distributedshell.Client.main(Client.java:215) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)