[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276739#comment-16276739
 ] 

ASF GitHub Bot commented on MAPREDUCE-7017:
-------------------------------------------

GitHub user jiayuhan-it opened a pull request:

    https://github.com/apache/hadoop/pull/309

    MAPREDUCE-7017:Too many times of meaningless invocation in 
TaskAttemptImpl#resolveHosts

    MRAppMaster uses TaskAttemptImpl::resolveHosts to determine the 
dataLocalHosts for each task when the location of data split is IP, which will 
call a lot of times ( taskNum * dfsReplication) of function 
InetAddress::getByName and most of the funcition calls are redundant. When the 
job has a great number of tasks and the speed of DNS resolution is not fast 
enough, it will take a lot of time at this stage before the job running.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jiayuhan-it/hadoop trunk

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hadoop/pull/309.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #309
    
----
commit fd42cd960bcfc40fd479d9af0109b15f21a11811
Author: jiayuhan <[email protected]>
Date:   2017-12-04T10:46:42Z

    Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts

----


> Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7017
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7017
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am
>    Affects Versions: 3.0.0-alpha4
>            Reporter: jiayuhan-it
>         Attachments: MAPREDUCE-7017.001.patch
>
>
>   MRAppMaster uses {{TaskAttemptImpl::resolveHosts}} to determine the 
> dataLocalHosts for each task when the location of data split is IP, which 
> will call a lot of times ( taskNum * dfsReplication) of function 
> {{InetAddress::getByName}} and most of the funcition calls are redundant.  
> When the job has a great number of tasks and the speed of DNS resolution is 
> not fast enough, it will take a lot of time at this stage before the job 
> running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to