[
https://issues.apache.org/jira/browse/TEZ-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592009#comment-14592009
]
Johannes Zillmann commented on TEZ-2561:
----------------------------------------
So it seems that good old MapReduce has kind of similar symptoms.
Looking at the AppMaster log is see:
{code}
015-06-18 16:14:02,709 INFO [Socket Reader #1 for port 52195]
org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 52195
2015-06-18 16:14:03,575 INFO [Socket Reader #1 for port 52197]
org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 52197
{code}
Once i specified yarn.app.mapreduce.am.job.client.port-range=31000-32000 i see
{code}
2015-06-18 16:16:26,920 INFO [Socket Reader #1 for port 31000]
org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 31000
2015-06-18 16:16:27,650 INFO [Socket Reader #1 for port 52295]
org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 52295
{code}
with such follow up loggings:
{code}
2015-06-18 16:16:34,051 INFO [IPC Server handler 1 on 52295]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request from
attempt_1434557883115_0021_m_000000_0
2015-06-18 16:16:34,052 INFO [IPC Server handler 1 on 52295]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Result of canCommit for
attempt_1434557883115_0021_m_000000_0:true
{code}
So it seems MapReduce’ TaskAttemptListenerImpl suffers the same issue as Tez’
TaskAttemptListenerImpTezDag. Looking at the code they seem to explicitly bind
it to 0.0.0.0.
> Port for TaskAttemptListenerImpTezDag should be configurable
> ------------------------------------------------------------
>
> Key: TEZ-2561
> URL: https://issues.apache.org/jira/browse/TEZ-2561
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Johannes Zillmann
> Attachments: TEZ-2561-1.patch
>
>
> Noticed sporadic DAG failures in our ec2 test environment.
> Tasks failing with that:
> {noformat}
> 2015-06-17 11:19:51,064 INFO [main] impl.MetricsSystemImpl: Scheduled
> snapshot period at 10 second(s).
> 2015-06-17 11:19:51,064 INFO [main] impl.MetricsSystemImpl: TezTask metrics
> system started
> 2015-06-17 11:19:51,259 INFO [TezChild] task.ContainerReporter: Attempting to
> fetch new task
> 2015-06-17 11:20:11,311 INFO [TezChild] ipc.Client: Retrying connect to
> server: ip-10-149-102-100.ec2.internal/10.149.102.100:60630. Already tried 0
> time(s); maxRetries=5
> 2015-06-17 11:20:31,312 INFO [TezChild] ipc.Client: Retrying connect to
> server: ip-10-149-102-100.ec2.internal/10.149.102.100:60630. Already tried 1
> time(s); maxRetries=5
> 2015-06-17 11:20:51,313 INFO [TezChild] ipc.Client: Retrying connect to
> server: ip-10-149-102-100.ec2.internal/10.149.102.100:60630. Already tried 2
> time(s); maxRetries=5
> 2015-06-17 11:21:11,314 INFO [TezChild] ipc.Client: Retrying connect to
> server: ip-10-149-102-100.ec2.internal/10.149.102.100:60630. Already tried 3
> time(s); maxRetries=5
> 2015-06-17 11:21:31,315 INFO [TezChild] ipc.Client: Retrying connect to
> server: ip-10-149-102-100.ec2.internal/10.149.102.100:60630. Already tried 4
> time(s); maxRetries=5
> 2015-06-17 11:21:51,317 INFO [main] impl.MetricsSystemImpl: Stopping TezTask
> metrics system...
> 2015-06-17 11:21:51,318 INFO [main] impl.MetricsSystemImpl: TezTask metrics
> system stopped.
> 2015-06-17 11:21:51,318 INFO [main] impl.MetricsSystemImpl: TezTask metrics
> system shutdown complete.
> {noformat}
> From the AppMaster:
> {noformat}
> Created DAGAppMaster for application appattempt_1434553606315_0022_000001
> 2015-06-17 11:19:43,655 INFO [Socket Reader #1 for port 60630] ipc.Server:
> Starting Socket Reader #1 for port 60630
> 2015-06-17 11:19:43,656 INFO [Socket Reader #1 for port 31001] ipc.Server:
> Starting Socket Reader #1 for port 31001
> 2015-06-17 11:19:43,713 WARN
> [ServiceThread:org.apache.tez.dag.history.HistoryEventHandler]
> conf.Configuration: mapred-site.xml:an attempt to override final parameter:
> mapreduce.cluster.local.dir; Ignoring.
> {noformat}
> [~hitesh] mentioned its likely to be the TaskAttemptListenerImpTezDag which
> starts on that port. Would be nice if the port(-range) can be configured!!!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)