[
https://issues.apache.org/jira/browse/TEZ-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638118#comment-14638118
]
TezQA commented on TEZ-2630:
----------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12746705/TEZ-2630.2.patch
against master revision 19fb440.
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. There were no new javadoc warning messages.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 3.0.1) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in .
Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/910//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/910//console
This message is automatically generated.
> TezChild receives IP address instead of FQDN
> ---------------------------------------------
>
> Key: TEZ-2630
> URL: https://issues.apache.org/jira/browse/TEZ-2630
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Rajat Jain
> Assignee: Hitesh Shah
> Priority: Critical
> Attachments: TEZ-2630.2.patch, TEZ-2630.patch
>
>
> I am running a yarn cluster on AWS. The slave nodes (NMs) are all configured
> to listen on private DNS. For example, a sample node manager listens on
> ip-10-16-141-168.ec2.internal:8042.
> When I'm trying to run a Tez job (even simple ones like select count(*) from
> nation) - they fail because child tasks are unable to connect to the AM. The
> issue is they are trying to connect to the IP instead of the private DNS.
> Here's a sample log line (couple of them added by me for debugging):
> {code}
> 2015-07-21 17:08:21,919 INFO [main] task.TezChild: TezChild starting
> 2015-07-21 17:08:22,310 INFO [main] task.TezChild: Using socket factory
> class: org.apache.hadoop.net.StandardSocketFactory
> 2015-07-21 17:08:22,336 INFO [main] task.TezChild: PID, containerIdentifier:
> 3699, container_1437498369268_0001_01_000002
> 2015-07-21 17:08:22,418 INFO [main] Configuration.deprecation:
> fs.default.name is deprecated. Instead, use fs.defaultFS
> 2015-07-21 17:08:23,025 INFO [main] task.TezChild: Got host:port:
> 10.16.141.168:37949
> 2015-07-21 17:08:23,035 INFO [main] task.TezChild: address variables:
> 10.16.141.168:37949
> 2015-07-21 17:08:23,143 INFO [TezChild] task.ContainerReporter: Attempting to
> fetch new task
> 2015-07-21 17:08:24,201 INFO [TezChild] ipc.Client: Retrying connect to
> server: 10.16.141.168/10.16.141.168:37949. Already tried 0 time(s); retry
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000
> MILLISECONDS)
> 2015-07-21 17:08:25,202 INFO [TezChild] ipc.Client: Retrying connect to
> server: 10.16.141.168/10.16.141.168:37949. Already tried 1 time(s); retry
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000
> MILLISECONDS)
> 2015-07-21 17:08:26,757 INFO [TezChild] ipc.Client: Retrying connect to
> server: 10.16.141.168/10.16.141.168:37949. Already tried 2 time(s); retry
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000
> MILLISECONDS)
> 2015-07-21 17:08:27,758 INFO [TezChild] ipc.Client: Retrying connect to
> server: 10.16.141.168/10.16.141.168:37949. Already tried 3 time(s); retry
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000
> MILLISECONDS)
> {code}
> AM is listening at the right address. But TezChild is receiving the IP
> address instead of the private DNS.
> AM logs:
> {code}
> 2015-07-21 18:09:27,906 INFO
> [ServiceThread:org.apache.tez.dag.app.TaskAttemptListenerImpTezDag]
> app.TaskAttemptListenerImpTezDag: Listening at address:
> ip-10-234-2-80.ec2.internal:49967
> {code}
> TezChild logs:
> {code}
> 2015-07-21 18:09:35,353 INFO [main] task.TezChild: TezChild starting
> 2015-07-21 18:09:35,379 INFO [main] task.TezChild: Args:
> 10.234.2.80,49967,container_1437501941642_0001_01_000002,application_1437501941642_0001,1
> 2015-07-21 18:09:35,770 INFO [main] task.TezChild: Using socket factory
> class: org.apache.hadoop.net.StandardSocketFactory
> 2015-07-21 18:09:35,785 INFO [main] task.TezChild: PID, containerIdentifier:
> 8670, container_1437501941642_0001_01_000002
> 2015-07-21 18:09:35,864 INFO [main] Configuration.deprecation:
> fs.default.name is deprecated. Instead, use fs.defaultFS
> 2015-07-21 18:09:36,403 INFO [main] task.TezChild: Got host:port:
> 10.234.2.80:49967
> 2015-07-21 18:09:36,413 INFO [main] task.TezChild: address variables:
> 10.234.2.80:49967
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)