[
https://issues.apache.org/jira/browse/MAPREDUCE-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276739#comment-16276739
]
ASF GitHub Bot commented on MAPREDUCE-7017:
-------------------------------------------
GitHub user jiayuhan-it opened a pull request:
https://github.com/apache/hadoop/pull/309
MAPREDUCE-7017:Too many times of meaningless invocation in
TaskAttemptImpl#resolveHosts
MRAppMaster uses TaskAttemptImpl::resolveHosts to determine the
dataLocalHosts for each task when the location of data split is IP, which will
call a lot of times ( taskNum * dfsReplication) of function
InetAddress::getByName and most of the funcition calls are redundant. When the
job has a great number of tasks and the speed of DNS resolution is not fast
enough, it will take a lot of time at this stage before the job running.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jiayuhan-it/hadoop trunk
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/hadoop/pull/309.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #309
----
commit fd42cd960bcfc40fd479d9af0109b15f21a11811
Author: jiayuhan <[email protected]>
Date: 2017-12-04T10:46:42Z
Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
----
> Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
> ------------------------------------------------------------------------
>
> Key: MAPREDUCE-7017
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7017
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: mr-am
> Affects Versions: 3.0.0-alpha4
> Reporter: jiayuhan-it
> Attachments: MAPREDUCE-7017.001.patch
>
>
> MRAppMaster uses {{TaskAttemptImpl::resolveHosts}} to determine the
> dataLocalHosts for each task when the location of data split is IP, which
> will call a lot of times ( taskNum * dfsReplication) of function
> {{InetAddress::getByName}} and most of the funcition calls are redundant.
> When the job has a great number of tasks and the speed of DNS resolution is
> not fast enough, it will take a lot of time at this stage before the job
> running.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]