[ 
https://issues.apache.org/jira/browse/AMBARI-12113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Fernandez updated AMBARI-12113:
-----------------------------------------
    Attachment:     (was: AMBARI-12113.branch-2.1.patch)

> Cluster deployment is missing tez.tar.gz in HDFS since service responsible 
> for uploading tarball is not co-hosted with Tez Client
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-12113
>                 URL: https://issues.apache.org/jira/browse/AMBARI-12113
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.1.0
>            Reporter: Alejandro Fernandez
>            Assignee: Alejandro Fernandez
>            Priority: Critical
>             Fix For: 2.1.0
>
>         Attachments: AMBARI-12113.branch-2.1.patch, AMBARI-12113.patch
>
>
> STR:
> * Deploy cluster with HDFS, YARN, MR, and Tez on 4 hosts as follows,
> ** Host 1: NameNode, ResourceManager, ZK Server, DataNode, NodeManager
> ** Host 2: Secondary NameNode, App Timeline Server, ZK Server, DataNode, 
> NodeManager.
> ** Host 3: ZK Server, DataNode, NodeManager.
> ** Host 4: Clients
> ** Host 5: Clients
> In this case, Host 1 has RM but no Tez client, so it cannot possibly upload 
> the tez tarball to HDFS.
> Also, consider the following 2 uses cases:
> 1. Install Tez first, which will require YARN.
> 2. Install YARN first, which does not require Tez, but still need to upload 
> tez.tar.gz when the Tez Service Check runs.
> {code}
> Traceback (most recent call last):
>   File 
> "/var/lib/ambari-agent/cache/common-services/TEZ/0.4.0.2.1/package/scripts/service_check.py",
>  line 98, in <module>
>     TezServiceCheck().execute()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 216, in execute
>     method(env)
>   File 
> "/var/lib/ambari-agent/cache/common-services/TEZ/0.4.0.2.1/package/scripts/service_check.py",
>  line 75, in service_check
>     bin_dir = params.hadoop_bin_dir
>   File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", 
> line 157, in __init__
>     self.env.run()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
> line 152, in run
>     self.run_action(resource, action)
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
> line 118, in run_action
>     provider_action()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/execute_hadoop.py",
>  line 55, in action_run
>     environment = self.resource.environment,
>   File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", 
> line 157, in __init__
>     self.env.run()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
> line 152, in run
>     self.run_action(resource, action)
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
> line 118, in run_action
>     provider_action()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
>  line 254, in action_run
>     tries=self.resource.tries, try_sleep=self.resource.try_sleep)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 70, in inner
>     result = function(command, **kwargs)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 92, in checked_call
>     tries=tries, try_sleep=try_sleep)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 140, in _call_wrapper
>     result = _call(command, **kwargs_copy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
> line 290, in _call
>     raise Fail(err_msg)
> resource_management.core.exceptions.Fail: Execution of 'hadoop --config 
> /usr/hdp/2.2.6.0-2800/hadoop/conf jar 
> /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount 
> /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/' returned 255. 
> Running OrderedWordCount
> 15/06/17 04:21:50 INFO client.TezClient: Tez Client Version: [ 
> component=tez-api, version=0.5.2.2.2.6.0-2800, 
> revision=790e651b4a64f7589008208580c9790548c2baf8, 
> SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, 
> buildTIme=20150518-1651 ]
> 15/06/17 04:21:51 INFO impl.TimelineClientImpl: Timeline service address: 
> http://c6405.ambari.apache.org:8188/ws/v1/timeline/
> 15/06/17 04:21:51 INFO client.RMProxy: Connecting to ResourceManager at 
> c6405.ambari.apache.org/192.168.64.105:8050
> 15/06/17 04:21:53 INFO client.TezClient: Submitting DAG application with id: 
> application_1434514777618_0005
> 15/06/17 04:21:53 INFO client.TezClientUtils: Using tez.lib.uris value from 
> configuration: /hdp/apps/2.2.6.0-2800/tez/tez.tar.gz
> java.io.FileNotFoundException: File does not exist: 
> /hdp/apps/2.2.6.0-2800/tez/tez.tar.gz
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1140)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1132)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1132)
>       at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:750)
>       at 
> org.apache.tez.client.TezClientUtils.getLRFileStatus(TezClientUtils.java:127)
>       at 
> org.apache.tez.client.TezClientUtils.setupTezJarsLocalResources(TezClientUtils.java:178)
>       at 
> org.apache.tez.client.TezClient.getTezJarResources(TezClient.java:721)
>       at 
> org.apache.tez.client.TezClient.submitDAGApplication(TezClient.java:689)
>       at 
> org.apache.tez.client.TezClient.submitDAGApplication(TezClient.java:667)
>       at org.apache.tez.client.TezClient.submitDAG(TezClient.java:353)
>       at 
> org.apache.tez.examples.OrderedWordCount.run(OrderedWordCount.java:208)
>       at 
> org.apache.tez.examples.OrderedWordCount.run(OrderedWordCount.java:232)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>       at 
> org.apache.tez.examples.OrderedWordCount.main(OrderedWordCount.java:240)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>       at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>       at org.apache.tez.examples.ExampleDriver.main(ExampleDriver.java:61)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}
> Analysis:
> tez.tar.gz needs to  be copied to HDFS. The problem is that we don't have a 
> way right now to copy it after all services have been installed and started 
> during cluster deployment, so instead, we rely on services starting to copy 
> the tarball.
> In order for this to work, the host with Tez Client also needs to have HDFS 
> Client, Yarn Client, and MR Client. Further, copying to HDFS requires 
> NameNode to be up, and DataNodes to be functional.
> AMBARI-9997 had ResourceManager copy the tez tarball; the problem was that if 
> the host with RM didn't have Tez client, it wouldn't find the tarball.
> The change I'm proposing is to
> * Switch this to HistoryServer instead of RM since HistoryServer already 
> copies the mapreduce tarball.
> * Installing Tez also requires YARN service, including HistoryServer. 
> HistoryServer is now co-hosted with Tez Client, so this guarantees it can 
> copy the tarball.
> * Installing HistoryServer by itself will not copy the tarball. However, if 
> Tez is installed later, then its Service Check is responsible for copying the 
> tarball to HDFS, and this host is also guaranteed to have HDFS Client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to