Alejandro Fernandez created AMBARI-12113:
--------------------------------------------
Summary: Cluster deployment is missing tez.tar.gz in HDFS since
service responsible for uploading tarball is not co-hosted with Tez Client
Key: AMBARI-12113
URL: https://issues.apache.org/jira/browse/AMBARI-12113
Project: Ambari
Issue Type: Bug
Components: ambari-server
Affects Versions: 2.1.0
Reporter: Alejandro Fernandez
Assignee: Alejandro Fernandez
Priority: Critical
Fix For: 2.1.0
STR:
* Deploy cluster with HDFS, YARN, MR, and Tez on 4 hosts as follows,
** Host 1: NameNode, ResourceManager, ZK Server, DataNode, NodeManager
** Host 2: Secondary NameNode, App Timeline Server, ZK Server, DataNode,
NodeManager.
** Host 3: ZK Server, DataNode, NodeManager.
** Host 4: Clients
** Host 5: Clients
In this case, Host 1 has RM but no Tez client, so it cannot possibly upload the
tez tarball to HDFS.
Also, consider the following 2 uses cases:
1. Install Tez first, which will require YARN.
2. Install YARN first, which does not require Tez, but still need to upload
tez.tar.gz when the Tez Service Check runs.
{code}
Traceback (most recent call last):
File
"/var/lib/ambari-agent/cache/common-services/TEZ/0.4.0.2.1/package/scripts/service_check.py",
line 98, in <module>
TezServiceCheck().execute()
File
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 216, in execute
method(env)
File
"/var/lib/ambari-agent/cache/common-services/TEZ/0.4.0.2.1/package/scripts/service_check.py",
line 75, in service_check
bin_dir = params.hadoop_bin_dir
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py",
line 157, in __init__
self.env.run()
File
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
line 152, in run
self.run_action(resource, action)
File
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
line 118, in run_action
provider_action()
File
"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/execute_hadoop.py",
line 55, in action_run
environment = self.resource.environment,
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py",
line 157, in __init__
self.env.run()
File
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
line 152, in run
self.run_action(resource, action)
File
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
line 118, in run_action
provider_action()
File
"/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
line 254, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
line 70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
line 92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
line 140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
line 290, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'hadoop --config
/usr/hdp/2.2.6.0-2800/hadoop/conf jar
/usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount
/tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/' returned 255. Running
OrderedWordCount
15/06/17 04:21:50 INFO client.TezClient: Tez Client Version: [
component=tez-api, version=0.5.2.2.2.6.0-2800,
revision=790e651b4a64f7589008208580c9790548c2baf8,
SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git,
buildTIme=20150518-1651 ]
15/06/17 04:21:51 INFO impl.TimelineClientImpl: Timeline service address:
http://c6405.ambari.apache.org:8188/ws/v1/timeline/
15/06/17 04:21:51 INFO client.RMProxy: Connecting to ResourceManager at
c6405.ambari.apache.org/192.168.64.105:8050
15/06/17 04:21:53 INFO client.TezClient: Submitting DAG application with id:
application_1434514777618_0005
15/06/17 04:21:53 INFO client.TezClientUtils: Using tez.lib.uris value from
configuration: /hdp/apps/2.2.6.0-2800/tez/tez.tar.gz
java.io.FileNotFoundException: File does not exist:
/hdp/apps/2.2.6.0-2800/tez/tez.tar.gz
at
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1140)
at
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1132)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1132)
at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:750)
at
org.apache.tez.client.TezClientUtils.getLRFileStatus(TezClientUtils.java:127)
at
org.apache.tez.client.TezClientUtils.setupTezJarsLocalResources(TezClientUtils.java:178)
at
org.apache.tez.client.TezClient.getTezJarResources(TezClient.java:721)
at
org.apache.tez.client.TezClient.submitDAGApplication(TezClient.java:689)
at
org.apache.tez.client.TezClient.submitDAGApplication(TezClient.java:667)
at org.apache.tez.client.TezClient.submitDAG(TezClient.java:353)
at
org.apache.tez.examples.OrderedWordCount.run(OrderedWordCount.java:208)
at
org.apache.tez.examples.OrderedWordCount.run(OrderedWordCount.java:232)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at
org.apache.tez.examples.OrderedWordCount.main(OrderedWordCount.java:240)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.tez.examples.ExampleDriver.main(ExampleDriver.java:61)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
{code}
Analysis:
tez.tar.gz needs to be copied to HDFS. The problem is that we don't have a way
right now to copy it after all services have been installed and started during
cluster deployment, so instead, we rely on services starting to copy the
tarball.
In order for this to work, the host with Tez Client also needs to have HDFS
Client, Yarn Client, and MR Client. Further, copying to HDFS requires NameNode
to be up, and DataNodes to be functional.
AMBARI-9997 had ResourceManager copy the tez tarball; the problem was that if
the host with RM didn't have Tez client, it wouldn't find the tarball.
The change I'm proposing is to
* Switch this to HistoryServer instead of RM, because this is more efficient
during RU since there's only one MR HistoryServer vs many RMs.
* Installing Tez also requires YARN service, including HistoryServer.
HistoryServer is now co-hosted with Tez Client, so this guarantees it can copy
the tarball.
* Installing HistoryServer by itself will not copy the tarball. However, if Tez
is installed later, then its Service Check is responsible for copying the
tarball to HDFS, and this host is also guaranteed to have HDFS Client.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)