Ahmed Hussein created HADOOP-17408: -------------------------------------- Summary: Optimize NetworkTopology while sorting of block locations Key: HADOOP-17408 URL: https://issues.apache.org/jira/browse/HADOOP-17408 Project: Hadoop Common Issue Type: Improvement Components: common, net Reporter: Ahmed Hussein Assignee: Ahmed Hussein
In {{NetworkTopology}}, I noticed that there are some hanging fruits to improve the performance. Inside {{sortByDistance}}, collections.shuffle is performed on the list before calling {{secondarySort}}. {code:java} Collections.shuffle(list, r); if (secondarySort != null) { secondarySort.accept(list); } {code} However, in different call sites, {{collections.shuffle}} is passed as the secondarySort to {{sortByDistance}}. This means that the shuffle is executed twice on each list. Also, logic wise, it is useless to shuffle before applying a tie breaker which might make the shuffle work obsolete. In addition, [~daryn] reported that: * topology is unnecessarily locking/unlocking to calculate the distance for every node * shuffling uses a seeded Random, instead of ThreadLocalRandom, which is heavily synchronized -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org