[
https://issues.apache.org/jira/browse/TEZ-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hitesh Shah resolved TEZ-1924.
------------------------------
Resolution: Fixed
Fix Version/s: 0.5.4
Committed to master, branch 0.5 and branch 0.6. Thanks for your contribution
[~ivanmi]
> Tez AM does not register with AM with full FQDN causing jobs to fail in some
> environments
> -----------------------------------------------------------------------------------------
>
> Key: TEZ-1924
> URL: https://issues.apache.org/jira/browse/TEZ-1924
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.5.2
> Reporter: Ivan Mitic
> Assignee: Ivan Mitic
> Fix For: 0.5.4
>
> Attachments: TEZ-1924.2.patch, TEZ-20.patch
>
>
> Issue originally reported by [~Karam Singh].
> All OrderWordCount, WordCount and Tez tests faultTolerance system tests
> failed due to java.net.UnknownHostException
> Interesting other tez examples such as mrrsleep, randomwriter,
> randomtextwriter, sort, join_inner, join_outer, terasort,
> groupbyorderbymrrtest ran fine
> one such example is following
> {code}
> RUNNING: /usr/lib/hadoop/bin/hadoop jar
> /usr/lib/tez/tez-mapreduce-examples-0.4.0.2.1.7.0-784.jar orderedwordcount
> "-DUSE_TEZ_SESSION=true" "-Dmapreduce.map.memory.mb=2048"
> "-Dtez.am.shuffle-vertex-manager.max-src-fraction=0"
> "-Dmapreduce.reduce.memory.mb=2048" "-Dmapreduce.framework.name=yarn-tez"
> "-Dtez.am.container.reuse.enabled=false" "-Dtez.am.log.level=DEBUG"
> "-Dmapreduce.map.java.opts=-Xmx1024m"
> "-Dtez.am.shuffle-vertex-manager.min-src-fraction=0"
> "-Dmapreduce.job.reduce.slowstart.completedmaps=0.01"
> "-Dmapreduce.reduce.java.opts=-Xmx1024m"
> "-Dtez.am.container.session.delay-allocation-millis=120000"
> /user/hrt_qa/Tez_CR_1/TestContainerReuse1 /user/hrt_qa/Tez_CROutput_1
> /user/hrt_qa/Tez_CR_2/TestContainerReuse2 /user/hrt_qa/Tez_CROutput_2
> -generateSplitsInClient true
> 14/12/19 09:20:05 INFO impl.TimelineClientImpl: Timeline service address:
> http://0.0.0.0:8188/ws/v1/timeline/
> 14/12/19 09:20:05 INFO client.RMProxy: Connecting to ResourceManager at
> headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
> 14/12/19 09:20:05 INFO client.AHSProxy: Connecting to Application History
> server at /0.0.0.0:10200
> 14/12/19 09:20:06 INFO impl.MetricsConfig: loaded properties from
> hadoop-metrics2.properties
> 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: Scheduled snapshot period at
> 60 second(s).
> 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: azure-file-system metrics
> system started
> 14/12/19 09:20:07 INFO client.TezClientUtils: Permissions on staging
> directory
> wasb://[email protected]/user/hrt_qa/.staging/application_1418977790315_0016
> are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx------
> 14/12/19 09:20:07 INFO examples.OrderedWordCount: Creating Tez Session
> 14/12/19 09:20:07 INFO impl.TimelineClientImpl: Timeline service address:
> http://0.0.0.0:8188/ws/v1/timeline/
> 14/12/19 09:20:07 INFO client.RMProxy: Connecting to ResourceManager at
> headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
> 14/12/19 09:20:07 INFO client.AHSProxy: Connecting to Application History
> server at /0.0.0.0:10200
> 14/12/19 09:20:09 INFO impl.YarnClientImpl: Submitted application
> application_1418977790315_0016
> 14/12/19 09:20:09 INFO examples.OrderedWordCount: Created Tez Session
> 14/12/19 09:20:09 INFO examples.OrderedWordCount: Running OrderedWordCount
> DAG, dagIndex=1, inputPath=/user/hrt_qa/Tez_CR_1/TestContainerReuse1,
> outputPath=/user/hrt_qa/Tez_CROutput_1
> 14/12/19 09:20:09 INFO hadoop.MRHelpers: Generating new input splits,
> splitsDir=wasb://[email protected]/user/hrt_qa/.staging/application_1418977790315_0016
> 14/12/19 09:20:09 INFO input.FileInputFormat: Total input paths to process :
> 20
> 14/12/19 09:20:09 INFO examples.OrderedWordCount: Waiting for TezSession to
> get into ready state
> 14/12/19 09:20:14 INFO client.TezSession: Failed to retrieve AM Status via
> proxy
> org.apache.tez.dag.api.TezException: com.google.protobuf.ServiceException:
> java.net.UnknownHostException: Invalid host name: local host is: (unknown);
> destination host is: "workernode1":59575; java.net.UnknownHostException; For
> more details see: http://wiki.apache.org/hadoop/UnknownHost
> at
> org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:351)
> at
> org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:538)
> at
> org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
> at
> org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: com.google.protobuf.ServiceException:
> java.net.UnknownHostException: Invalid host name: local host is: (unknown);
> destination host is: "workernode1":59575; java.net.UnknownHostException; For
> more details see: http://wiki.apache.org/hadoop/UnknownHost
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
> at com.sun.proxy.$Proxy24.getAMStatus(Unknown Source)
> at
> org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:337)
> ... 14 more
> Caused by: java.net.UnknownHostException: Invalid host name: local host is:
> (unknown); destination host is: "workernode1":59575;
> java.net.UnknownHostException; For more details see:
> http://wiki.apache.org/hadoop/UnknownHost
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:742)
> at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:400)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1452)
> at org.apache.hadoop.ipc.Client.call(Client.java:1381)
> at org.apache.hadoop.ipc.Client.call(Client.java:1363)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> ... 16 more
> Caused by: java.net.UnknownHostException
> ... 21 more
> ....................
> ....................
> Caused by: java.net.UnknownHostException: Invalid host name: local host is:
> (unknown); destination host is: "workernode1":59575;
> java.net.UnknownHostException; For more details see:
> http://wiki.apache.org/hadoop/UnknownHost
> at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown
> Source)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:742)
> at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:400)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1452)
> at org.apache.hadoop.ipc.Client.call(Client.java:1381)
> at org.apache.hadoop.ipc.Client.call(Client.java:1363)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> ... 16 more
> Caused by: java.net.UnknownHostException
> ... 21 more
> 14/12/19 09:25:19 ERROR examples.OrderedWordCount: Error occurred when
> submitting/running DAGs
> java.lang.RuntimeException: TezSession has already shutdown
> at
> org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:540)
> at
> org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
> at
> org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/12/19 09:25:19 INFO examples.OrderedWordCount: Shutting down session
> 14/12/19 09:25:19 INFO client.TezSession: Shutting down Tez Session,
> sessionName=OrderedWordCountSession,
> applicationId=application_1418977790315_0016
> 14/12/19 09:25:19 INFO client.TezSession: Failed to shutdown Tez Session via
> proxy
> org.apache.tez.dag.api.SessionNotRunning: Application not running,
> applicationId=application_1418977790315_0016, yarnApplicationState=FINISHED,
> finalApplicationStatus=SUCCEEDED,
> trackingUrl=http://headnode0.humb-tez1-ssh.d5.internal.cloudapp.net:8088/proxy/application_1418977790315_0016/A
> at
> org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:733)
> at org.apache.tez.client.TezSession.stop(TezSession.java:281)
> at
> org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:524)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
> at
> org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/12/19 09:25:19 INFO client.TezSession: Could not connect to AM, killing
> session via YARN, sessionName=OrderedWordCountSession,
> applicationId=application_1418977790315_0016
> 14/12/19 09:25:19 INFO impl.YarnClientImpl: Killed application
> application_1418977790315_0016
> java.lang.RuntimeException: TezSession has already shutdown
> at
> org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:540)
> at
> org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
> at
> org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {code}
> Contents of /etc/hosts are:
> {code}
> 127.0.0.1 localhost
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> {code}
> and contents of resolv.conf are:
> {code}
> # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
> # DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
> nameserver 168.63.129.16
> search humb-tez1-ssh.d5.internal.cloudapp.net
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)