[jira] [Commented] (HDFS-6840) Clients are always sent to the same datanode when read is off rack

Daryn Sharp (JIRA) Tue, 12 Aug 2014 07:45:40 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094111#comment-14094111
 ]


Daryn Sharp commented on HDFS-6840:
-----------------------------------

bq.  My impression of the old pseudo-sort was that it was deterministic. AFAIK 
there wasn't a Random doing a shuffle.

I'm familiar with the block location selection and helped debug this issue.  As 
Jason points out, the sort result used to be the node-local, rack-local, 
off-rack, decommed.  If the the first location isn't node/rack local, then it 
picked a random off-rack.  The rest of the list, yes, was deterministic.

bq.  in case there were any nice page cache effects from directing reads to the 
same replica.

Getting crushed by thousands of off-rack requests ruins any benefit from the 
page cache.  At a minimum, the network interface will be saturated at which 
point choosing another random replica off-rack replica is faster.  Let's say 
the first node is "bad" because of the load or for other reasons.  All off-rack 
tasks will be slowed until they fail over to the next node, which again will be 
the another deterministic node, which may cause it to be considered "bad", 
repeat.

Even on the same rack, I'm not sure there's a benefit to deterministically 
picking the same node.  There's generally only 2 replicas per rack.  If a large 
number of rack-local tasks start up then dividing the load between the two 
replicas probably has better performance.


> Clients are always sent to the same datanode when read is off rack
> ------------------------------------------------------------------
>
>                 Key: HDFS-6840
>                 URL: https://issues.apache.org/jira/browse/HDFS-6840
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Priority: Critical
>
> After HDFS-6268 the sorting order of block locations is deterministic for a 
> given block and locality level (e.g.: local, rack. off-rack), so off-rack 
> clients all see the same datanode for the same block.  This leads to very 
> poor behavior in distributed cache localization and other scenarios where 
> many clients all want the same block data at approximately the same time.  
> The one datanode is crushed by the load while the other replicas only handle 
> local and rack-local requests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6840) Clients are always sent to the same datanode when read is off rack

Reply via email to