Ahmed Hussein created HADOOP-17408:
--------------------------------------

             Summary: Optimize NetworkTopology while sorting of block locations
                 Key: HADOOP-17408
                 URL: https://issues.apache.org/jira/browse/HADOOP-17408
             Project: Hadoop Common
          Issue Type: Improvement
          Components: common, net
            Reporter: Ahmed Hussein
            Assignee: Ahmed Hussein


In {{NetworkTopology}}, I noticed that there are some hanging fruits to improve 
the performance.

Inside {{sortByDistance}}, collections.shuffle is performed on the list before 
calling {{secondarySort}}.

{code:java}
Collections.shuffle(list, r);
if (secondarySort != null) {
  secondarySort.accept(list);
}
{code}

However, in different call sites, {{collections.shuffle}} is passed as the 
secondarySort to {{sortByDistance}}. This means that the shuffle is executed 
twice on each list.
Also, logic wise, it is useless to shuffle before applying a tie breaker which 
might make the shuffle work obsolete.

In addition, [~daryn] reported that:
* topology is unnecessarily locking/unlocking to calculate the distance for 
every node
* shuffling uses a seeded Random, instead of ThreadLocalRandom, which is 
heavily synchronized




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to