[ 
https://issues.apache.org/jira/browse/TEZ-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah resolved TEZ-1924.
------------------------------
       Resolution: Fixed
    Fix Version/s: 0.5.4

Committed to master, branch 0.5 and branch 0.6. Thanks for your contribution 
[~ivanmi]

> Tez AM does not register with AM with full FQDN causing jobs to fail in some 
> environments
> -----------------------------------------------------------------------------------------
>
>                 Key: TEZ-1924
>                 URL: https://issues.apache.org/jira/browse/TEZ-1924
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.2
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>             Fix For: 0.5.4
>
>         Attachments: TEZ-1924.2.patch, TEZ-20.patch
>
>
> Issue originally reported by [~Karam Singh].
> All OrderWordCount, WordCount and Tez tests faultTolerance system tests 
> failed due to java.net.UnknownHostException
> Interesting other tez examples such as mrrsleep, randomwriter, 
> randomtextwriter, sort, join_inner, join_outer, terasort, 
> groupbyorderbymrrtest ran fine
> one such example is following
> {code}
> RUNNING: /usr/lib/hadoop/bin/hadoop jar 
> /usr/lib/tez/tez-mapreduce-examples-0.4.0.2.1.7.0-784.jar orderedwordcount 
> "-DUSE_TEZ_SESSION=true" "-Dmapreduce.map.memory.mb=2048" 
> "-Dtez.am.shuffle-vertex-manager.max-src-fraction=0" 
> "-Dmapreduce.reduce.memory.mb=2048" "-Dmapreduce.framework.name=yarn-tez" 
> "-Dtez.am.container.reuse.enabled=false" "-Dtez.am.log.level=DEBUG" 
> "-Dmapreduce.map.java.opts=-Xmx1024m" 
> "-Dtez.am.shuffle-vertex-manager.min-src-fraction=0" 
> "-Dmapreduce.job.reduce.slowstart.completedmaps=0.01" 
> "-Dmapreduce.reduce.java.opts=-Xmx1024m" 
> "-Dtez.am.container.session.delay-allocation-millis=120000" 
> /user/hrt_qa/Tez_CR_1/TestContainerReuse1 /user/hrt_qa/Tez_CROutput_1 
> /user/hrt_qa/Tez_CR_2/TestContainerReuse2 /user/hrt_qa/Tez_CROutput_2 
> -generateSplitsInClient true
> 14/12/19 09:20:05 INFO impl.TimelineClientImpl: Timeline service address: 
> http://0.0.0.0:8188/ws/v1/timeline/
> 14/12/19 09:20:05 INFO client.RMProxy: Connecting to ResourceManager at 
> headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
> 14/12/19 09:20:05 INFO client.AHSProxy: Connecting to Application History 
> server at /0.0.0.0:10200
> 14/12/19 09:20:06 INFO impl.MetricsConfig: loaded properties from 
> hadoop-metrics2.properties
> 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
> 60 second(s).
> 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: azure-file-system metrics 
> system started
> 14/12/19 09:20:07 INFO client.TezClientUtils: Permissions on staging 
> directory 
> wasb://[email protected]/user/hrt_qa/.staging/application_1418977790315_0016
>  are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx------
> 14/12/19 09:20:07 INFO examples.OrderedWordCount: Creating Tez Session
> 14/12/19 09:20:07 INFO impl.TimelineClientImpl: Timeline service address: 
> http://0.0.0.0:8188/ws/v1/timeline/
> 14/12/19 09:20:07 INFO client.RMProxy: Connecting to ResourceManager at 
> headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
> 14/12/19 09:20:07 INFO client.AHSProxy: Connecting to Application History 
> server at /0.0.0.0:10200
> 14/12/19 09:20:09 INFO impl.YarnClientImpl: Submitted application 
> application_1418977790315_0016
> 14/12/19 09:20:09 INFO examples.OrderedWordCount: Created Tez Session
> 14/12/19 09:20:09 INFO examples.OrderedWordCount: Running OrderedWordCount 
> DAG, dagIndex=1, inputPath=/user/hrt_qa/Tez_CR_1/TestContainerReuse1, 
> outputPath=/user/hrt_qa/Tez_CROutput_1
> 14/12/19 09:20:09 INFO hadoop.MRHelpers: Generating new input splits, 
> splitsDir=wasb://[email protected]/user/hrt_qa/.staging/application_1418977790315_0016
> 14/12/19 09:20:09 INFO input.FileInputFormat: Total input paths to process : 
> 20
> 14/12/19 09:20:09 INFO examples.OrderedWordCount: Waiting for TezSession to 
> get into ready state
> 14/12/19 09:20:14 INFO client.TezSession: Failed to retrieve AM Status via 
> proxy
> org.apache.tez.dag.api.TezException: com.google.protobuf.ServiceException: 
> java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
> destination host is: "workernode1":59575; java.net.UnknownHostException; For 
> more details see:  http://wiki.apache.org/hadoop/UnknownHost
>       at 
> org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:351)
>       at 
> org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:538)
>       at 
> org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>       at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
>       at 
> org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: com.google.protobuf.ServiceException: 
> java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
> destination host is: "workernode1":59575; java.net.UnknownHostException; For 
> more details see:  http://wiki.apache.org/hadoop/UnknownHost
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
>       at com.sun.proxy.$Proxy24.getAMStatus(Unknown Source)
>       at 
> org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:337)
>       ... 14 more
> Caused by: java.net.UnknownHostException: Invalid host name: local host is: 
> (unknown); destination host is: "workernode1":59575; 
> java.net.UnknownHostException; For more details see:  
> http://wiki.apache.org/hadoop/UnknownHost
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>       at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:742)
>       at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:400)
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:1452)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1381)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>       ... 16 more
> Caused by: java.net.UnknownHostException
>       ... 21 more
> ....................
> ....................
> Caused by: java.net.UnknownHostException: Invalid host name: local host is: 
> (unknown); destination host is: "workernode1":59575; 
> java.net.UnknownHostException; For more details see:  
> http://wiki.apache.org/hadoop/UnknownHost
>       at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown 
> Source)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>       at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:742)
>       at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:400)
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:1452)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1381)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>       ... 16 more
> Caused by: java.net.UnknownHostException
>       ... 21 more
> 14/12/19 09:25:19 ERROR examples.OrderedWordCount: Error occurred when 
> submitting/running DAGs
> java.lang.RuntimeException: TezSession has already shutdown
>       at 
> org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:540)
>       at 
> org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>       at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
>       at 
> org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/12/19 09:25:19 INFO examples.OrderedWordCount: Shutting down session
> 14/12/19 09:25:19 INFO client.TezSession: Shutting down Tez Session, 
> sessionName=OrderedWordCountSession, 
> applicationId=application_1418977790315_0016
> 14/12/19 09:25:19 INFO client.TezSession: Failed to shutdown Tez Session via 
> proxy
> org.apache.tez.dag.api.SessionNotRunning: Application not running, 
> applicationId=application_1418977790315_0016, yarnApplicationState=FINISHED, 
> finalApplicationStatus=SUCCEEDED, 
> trackingUrl=http://headnode0.humb-tez1-ssh.d5.internal.cloudapp.net:8088/proxy/application_1418977790315_0016/A
>       at 
> org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:733)
>       at org.apache.tez.client.TezSession.stop(TezSession.java:281)
>       at 
> org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:524)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>       at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
>       at 
> org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/12/19 09:25:19 INFO client.TezSession: Could not connect to AM, killing 
> session via YARN, sessionName=OrderedWordCountSession, 
> applicationId=application_1418977790315_0016
> 14/12/19 09:25:19 INFO impl.YarnClientImpl: Killed application 
> application_1418977790315_0016
> java.lang.RuntimeException: TezSession has already shutdown
>       at 
> org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:540)
>       at 
> org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>       at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
>       at 
> org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {code} 
> Contents of /etc/hosts are:
> {code}
> 127.0.0.1 localhost
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> {code}
> and contents of resolv.conf are:
> {code}
> # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
> #     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
> nameserver 168.63.129.16
> search humb-tez1-ssh.d5.internal.cloudapp.net
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to