[ 
https://issues.apache.org/jira/browse/HADOOP-17408?focusedWorklogId=519789&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519789
 ]

ASF GitHub Bot logged work on HADOOP-17408:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Dec/20 17:48
            Start Date: 03/Dec/20 17:48
    Worklog Time Spent: 10m 
      Work Description: amahussein opened a new pull request #2514:
URL: https://github.com/apache/hadoop/pull/2514


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
   For more details, please see 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 519789)
    Remaining Estimate: 0h
            Time Spent: 10m

> Optimize NetworkTopology while sorting of block locations
> ---------------------------------------------------------
>
>                 Key: HADOOP-17408
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17408
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: common, net
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In {{NetworkTopology}}, I noticed that there are some hanging fruits to 
> improve the performance.
> Inside {{sortByDistance}}, collections.shuffle is performed on the list 
> before calling {{secondarySort}}.
> {code:java}
> Collections.shuffle(list, r);
> if (secondarySort != null) {
>   secondarySort.accept(list);
> }
> {code}
> However, in different call sites, {{collections.shuffle}} is passed as the 
> secondarySort to {{sortByDistance}}. This means that the shuffle is executed 
> twice on each list.
> Also, logic wise, it is useless to shuffle before applying a tie breaker 
> which might make the shuffle work obsolete.
> In addition, [~daryn] reported that:
> * topology is unnecessarily locking/unlocking to calculate the distance for 
> every node
> * shuffling uses a seeded Random, instead of ThreadLocalRandom, which is 
> heavily synchronized



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to