[ 
https://issues.apache.org/jira/browse/AMBARI-21530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hurley updated AMBARI-21530:
-------------------------------------
    Status: Patch Available  (was: Open)

> Service Checks During Upgrades Should Use Desired Stack
> -------------------------------------------------------
>
>                 Key: AMBARI-21530
>                 URL: https://issues.apache.org/jira/browse/AMBARI-21530
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.5.2
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Blocker
>             Fix For: 2.5.2
>
>         Attachments: AMBARI-21530.patch
>
>
> During an upgrade from BI 4.2 to HDP 2.6, some service checks were failing. 
> This is because the service checks were having their hooks/service folders 
> overwritten by some of the scheduler framework. At the time of orchestration, 
> the cluster desired ID was still BI but the effective ID used for the upgrade 
> was HDP (which was being clobbered)
> Exception on running YARN service check:
> {code}
> Traceback (most recent call last):
>   File 
> "/var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/YARN/package/scripts/service_check.py",
>  line 91, in <module>
>     ServiceCheck().execute()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 329, in execute
>     method(env)
>   File 
> "/var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/YARN/package/scripts/service_check.py",
>  line 54, in service_check
>     user=params.smokeuser,
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 72, in inner
>     result = function(command, **kwargs)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 102, in checked_call
>     tries=tries, try_sleep=try_sleep, 
> timeout_kill_strategy=timeout_kill_strategy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 150, in _call_wrapper
>     result = _call(command, **kwargs_copy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 303, in _call
>     raise ExecutionFailed(err_msg, code, out, err)
> resource_management.core.exceptions.ExecutionFailed: Execution of 'yarn 
> org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command ls 
> -num_containers 1 -jar 
> /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell*.jar' returned 
> 1. 17/07/19 19:34:40 INFO distributedshell.Client: Initializing Client
> 17/07/19 19:34:40 INFO distributedshell.Client: Running Client
> 17/07/19 19:34:40 INFO client.RMProxy: Connecting to ResourceManager at 
> sid-bigi-2.c.pramod-thangali.internal/10.240.0.47:8050
> 17/07/19 19:34:40 INFO client.AHSProxy: Connecting to Application History 
> server at sid-bigi-2.c.pramod-thangali.internal/10.240.0.47:10200
> 17/07/19 19:34:40 INFO distributedshell.Client: Got Cluster metric info from 
> ASM, numNodeManagers=1
> 17/07/19 19:34:40 INFO distributedshell.Client: Got Cluster node info from ASM
> 17/07/19 19:34:40 INFO distributedshell.Client: Got node report from ASM for, 
> nodeId=sid-bigi-3.c.pramod-thangali.internal:45454, 
> nodeAddresssid-bigi-3.c.pramod-thangali.internal:8042, 
> nodeRackName/default-rack, nodeNumContainers0
> 17/07/19 19:34:40 INFO distributedshell.Client: Queue info, 
> queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, 
> queueApplicationCount=0, queueChildQueueCount=0
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=root, userAcl=SUBMIT_APPLICATIONS
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=root, userAcl=ADMINISTER_QUEUE
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=default, userAcl=SUBMIT_APPLICATIONS
> 17/07/19 19:34:40 INFO distributedshell.Client: User ACL Info for Queue, 
> queueName=default, userAcl=ADMINISTER_QUEUE
> 17/07/19 19:34:40 INFO distributedshell.Client: Max mem capability of 
> resources in this cluster 10240
> 17/07/19 19:34:40 INFO distributedshell.Client: Max virtual cores capabililty 
> of resources in this cluster 3
> 17/07/19 19:34:40 INFO distributedshell.Client: Copy App Master jar from 
> local filesystem and add to local environment
> 17/07/19 19:34:41 FATAL distributedshell.Client: Error running Client
> java.io.FileNotFoundException: File 
> /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell*.jar does not 
> exist
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:624)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:850)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:614)
>       at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
>       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340)
>       at 
> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2012)
>       at 
> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1980)
>       at 
> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1945)
>       at 
> org.apache.hadoop.yarn.applications.distributedshell.Client.addToLocalResources(Client.java:820)
>       at 
> org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:532)
>       at 
> org.apache.hadoop.yarn.applications.distributedshell.Client.main(Client.java:215)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to