[
https://issues.apache.org/jira/browse/AMBARI-12113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Nettleton updated AMBARI-12113:
--------------------------------------
Fix Version/s: (was: 2.1.0)
2.1.1
> Cluster deployment is missing tez.tar.gz in HDFS since service responsible
> for uploading tarball is not co-hosted with Tez Client
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: AMBARI-12113
> URL: https://issues.apache.org/jira/browse/AMBARI-12113
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.1.0
> Reporter: Alejandro Fernandez
> Assignee: Alejandro Fernandez
> Priority: Critical
> Fix For: 2.1.1
>
> Attachments: AMBARI-12113.branch-2.1.patch, AMBARI-12113.patch
>
>
> STR:
> * Deploy cluster with HDFS, YARN, MR, and Tez on 4 hosts as follows,
> ** Host 1: NameNode, ResourceManager, ZK Server, DataNode, NodeManager
> ** Host 2: Secondary NameNode, App Timeline Server, ZK Server, DataNode,
> NodeManager.
> ** Host 3: ZK Server, DataNode, NodeManager.
> ** Host 4: Clients
> ** Host 5: Clients
> In this case, Host 1 has RM but no Tez client, so it cannot possibly upload
> the tez tarball to HDFS.
> Also, consider the following 2 uses cases:
> 1. Install Tez first, which will require YARN.
> 2. Install YARN first, which does not require Tez, but still need to upload
> tez.tar.gz when the Tez Service Check runs.
> {code}
> Traceback (most recent call last):
> File
> "/var/lib/ambari-agent/cache/common-services/TEZ/0.4.0.2.1/package/scripts/service_check.py",
> line 98, in <module>
> TezServiceCheck().execute()
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
> line 216, in execute
> method(env)
> File
> "/var/lib/ambari-agent/cache/common-services/TEZ/0.4.0.2.1/package/scripts/service_check.py",
> line 75, in service_check
> bin_dir = params.hadoop_bin_dir
> File "/usr/lib/python2.6/site-packages/resource_management/core/base.py",
> line 157, in __init__
> self.env.run()
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 152, in run
> self.run_action(resource, action)
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 118, in run_action
> provider_action()
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/execute_hadoop.py",
> line 55, in action_run
> environment = self.resource.environment,
> File "/usr/lib/python2.6/site-packages/resource_management/core/base.py",
> line 157, in __init__
> self.env.run()
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 152, in run
> self.run_action(resource, action)
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 118, in run_action
> provider_action()
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
> line 254, in action_run
> tries=self.resource.tries, try_sleep=self.resource.try_sleep)
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 70, in inner
> result = function(command, **kwargs)
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 92, in checked_call
> tries=tries, try_sleep=try_sleep)
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 140, in _call_wrapper
> result = _call(command, **kwargs_copy)
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 290, in _call
> raise Fail(err_msg)
> resource_management.core.exceptions.Fail: Execution of 'hadoop --config
> /usr/hdp/2.2.6.0-2800/hadoop/conf jar
> /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount
> /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/' returned 255.
> Running OrderedWordCount
> 15/06/17 04:21:50 INFO client.TezClient: Tez Client Version: [
> component=tez-api, version=0.5.2.2.2.6.0-2800,
> revision=790e651b4a64f7589008208580c9790548c2baf8,
> SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git,
> buildTIme=20150518-1651 ]
> 15/06/17 04:21:51 INFO impl.TimelineClientImpl: Timeline service address:
> http://c6405.ambari.apache.org:8188/ws/v1/timeline/
> 15/06/17 04:21:51 INFO client.RMProxy: Connecting to ResourceManager at
> c6405.ambari.apache.org/192.168.64.105:8050
> 15/06/17 04:21:53 INFO client.TezClient: Submitting DAG application with id:
> application_1434514777618_0005
> 15/06/17 04:21:53 INFO client.TezClientUtils: Using tez.lib.uris value from
> configuration: /hdp/apps/2.2.6.0-2800/tez/tez.tar.gz
> java.io.FileNotFoundException: File does not exist:
> /hdp/apps/2.2.6.0-2800/tez/tez.tar.gz
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1140)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1132)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1132)
> at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:750)
> at
> org.apache.tez.client.TezClientUtils.getLRFileStatus(TezClientUtils.java:127)
> at
> org.apache.tez.client.TezClientUtils.setupTezJarsLocalResources(TezClientUtils.java:178)
> at
> org.apache.tez.client.TezClient.getTezJarResources(TezClient.java:721)
> at
> org.apache.tez.client.TezClient.submitDAGApplication(TezClient.java:689)
> at
> org.apache.tez.client.TezClient.submitDAGApplication(TezClient.java:667)
> at org.apache.tez.client.TezClient.submitDAG(TezClient.java:353)
> at
> org.apache.tez.examples.OrderedWordCount.run(OrderedWordCount.java:208)
> at
> org.apache.tez.examples.OrderedWordCount.run(OrderedWordCount.java:232)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at
> org.apache.tez.examples.OrderedWordCount.main(OrderedWordCount.java:240)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
> at org.apache.tez.examples.ExampleDriver.main(ExampleDriver.java:61)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}
> Analysis:
> tez.tar.gz needs to be copied to HDFS. The problem is that we don't have a
> way right now to copy it after all services have been installed and started
> during cluster deployment, so instead, we rely on services starting to copy
> the tarball.
> In order for this to work, the host with Tez Client also needs to have HDFS
> Client, Yarn Client, and MR Client. Further, copying to HDFS requires
> NameNode to be up, and DataNodes to be functional.
> AMBARI-9997 had ResourceManager copy the tez tarball; the problem was that if
> the host with RM didn't have Tez client, it wouldn't find the tarball.
> The change I'm proposing is to
> * Switch this to HistoryServer instead of RM since HistoryServer already
> copies the mapreduce tarball.
> * Installing Tez also requires YARN service, including HistoryServer.
> HistoryServer is now co-hosted with Tez Client, so this guarantees it can
> copy the tarball.
> * Installing HistoryServer by itself will not copy the tarball. However, if
> Tez is installed later, then its Service Check is responsible for copying the
> tarball to HDFS, and this host is also guaranteed to have HDFS Client.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)