Jonathan Hurley created AMBARI-21530:
----------------------------------------

             Summary: Service Checks During Upgrades Should Use Desired Stack
                 Key: AMBARI-21530
                 URL: https://issues.apache.org/jira/browse/AMBARI-21530
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.5.2
            Reporter: Jonathan Hurley
            Assignee: Jonathan Hurley
            Priority: Blocker
             Fix For: 2.5.2


During an upgrade from BI 4.2 to HDP 2.6, some service checks were failing. 
This is because the service checks were having their hooks/service folders 
overwritten by some of the scheduler framework. At the time of orchestration, 
the cluster desired ID was still BI but the effective ID used for the upgrade 
was HDP (which was being clobbered)

Exception on running YARN service check:

{code}
Traceback (most recent call last):
  File 
"/var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/YARN/package/scripts/service_check.py",
 line 91, in <module>
    ServiceCheck().execute()
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
 line 329, in execute
    method(env)
  File 
"/var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/YARN/package/scripts/service_check.py",
 line 54, in service_check
    user=params.smokeuser,
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 102, in checked_call
    tries=tries, try_sleep=try_sleep, 
timeout_kill_strategy=timeout_kill_strategy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 303, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls 
-num_containers 1 -jar 
/usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell*.jar' returned 
1. 17/07/19 19:34:40 INFO distributedshell.Client: Initializing Client
17/07/19 19:34:40 INFO distributedshell.Client: Running Client
17/07/19 19:34:40 INFO client.RMProxy: Connecting to ResourceManager at 
sid-bigi-2.c.pramod-thangali.internal/10.240.0.47:8050
17/07/19 19:34:40 INFO client.AHSProxy: Connecting to Application History 
server at sid-bigi-2.c.pramod-thangali.internal/10.240.0.47:10200
17/07/19 19:34:40 INFO distributedshell.Client: Got Cluster metric info from 
ASM, numNodeManagers=1
17/07/19 19:34:40 INFO distributedshell.Client: Got Cluster node info from ASM
17/07/19 19:34:40 INFO distributedshell.Client: Got node report from ASM for, 
nodeId=sid-bigi-3.c.pramod-thangali.internal:45454, 
nodeAddresssid-bigi-3.c.pramod-thangali.internal:8042, 
nodeRackName/default-rack, nodeNumContainers0
17/07/19 19:34:40 INFO distributedshell.Client: Queue info, queueName=default, 
queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=0, 
queueChildQueueCount=0
17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
queueName=root, userAcl=SUBMIT_APPLICATIONS
17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
queueName=root, userAcl=ADMINISTER_QUEUE
17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
queueName=default, userAcl=SUBMIT_APPLICATIONS
17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
queueName=default, userAcl=ADMINISTER_QUEUE
17/07/19 19:34:40 INFO distributedshell.Client: Max mem capability of resources 
in this cluster 10240
17/07/19 19:34:40 INFO distributedshell.Client: Max virtual cores capabililty 
of resources in this cluster 3
17/07/19 19:34:40 INFO distributedshell.Client: Copy App Master jar from local 
filesystem and add to local environment
17/07/19 19:34:41 FATAL distributedshell.Client: Error running Client
java.io.FileNotFoundException: File 
/usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell*.jar does not 
exist
        at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:624)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:850)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:614)
        at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340)
        at 
org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2012)
        at 
org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1980)
        at 
org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1945)
        at 
org.apache.hadoop.yarn.applications.distributedshell.Client.addToLocalResources(Client.java:820)
        at 
org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:532)
        at 
org.apache.hadoop.yarn.applications.distributedshell.Client.main(Client.java:215)
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to