[ 
https://issues.apache.org/jira/browse/HDFS-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509310#comment-13509310
 ] 

Colin Patrick McCabe commented on HDFS-4253:
--------------------------------------------

More about that {{compare}} method:  I re-read this method, with an eye to 
seeing if it satisfied all the comparator requirements outlined in Effective 
Java, listed here: http://www.javapractices.com/topic/TopicAction.do?Id=10  I 
don't think that it does.  In particular, you need to satisfy the "if 
a.compare(b) < 0, b.compare(a) > 0" requirement.

I understand now why you special-cased a == b; it's so that the next few lines, 
where you test that a == rdr and b == rdr, they can't both be true at that 
point.  While this is correct, it's also non-obvious, so I would suggest either 
adding a comment or simply testing {{if (aIsLocal && bIsLocal)}} directly.

The bug comes later, where you always return 1 if neither Node is on the local 
rack.  This is wrong; it violates anticommutation (see link).  I would suggest 
something along these 
lines:

{code}
salt = Random.nextInt();
...
@Override
public int compare(Node a, Node b) {
  return ComparisonChain.start()
     .compare(a != rdr, b != rdr)
     .compare(!isOnSameRack(a, rdr), !isOnSameRack(b, rdr))
     .compare(a.getHash() ^ salt, b.getHash() ^ salt)
     .compare(a.getName(), b.getName());
     .result();
}
{code}
                
> block replica reads get hot-spots due to NetworkTopology#pseudoSortByDistance
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-4253
>                 URL: https://issues.apache.org/jira/browse/HDFS-4253
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 2.0.2-alpha
>            Reporter: Andy Isaacson
>            Assignee: Andy Isaacson
>         Attachments: hdfs4253-1.txt, hdfs4253.txt
>
>
> When many nodes (10) read from the same block simultaneously, we get 
> asymmetric distribution of read load.  This can result in slow block reads 
> when one replica is serving most of the readers and the other replicas are 
> idle.  The busy DN bottlenecks on its network link.
> This is especially visible with large block sizes and high replica counts (I 
> reproduced the problem with {{-Ddfs.block.size=4294967296}} and replication 
> 5), but the same behavior happens on a small scale with normal-sized blocks 
> and replication=3.
> The root of the problem is in {{NetworkTopology#pseudoSortByDistance}} which 
> explicitly does not try to spread traffic among replicas in a given rack -- 
> it only randomizes usage for off-rack replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to