[
https://issues.apache.org/jira/browse/HDFS-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094111#comment-14094111
]
Daryn Sharp commented on HDFS-6840:
-----------------------------------
bq. My impression of the old pseudo-sort was that it was deterministic. AFAIK
there wasn't a Random doing a shuffle.
I'm familiar with the block location selection and helped debug this issue. As
Jason points out, the sort result used to be the node-local, rack-local,
off-rack, decommed. If the the first location isn't node/rack local, then it
picked a random off-rack. The rest of the list, yes, was deterministic.
bq. in case there were any nice page cache effects from directing reads to the
same replica.
Getting crushed by thousands of off-rack requests ruins any benefit from the
page cache. At a minimum, the network interface will be saturated at which
point choosing another random replica off-rack replica is faster. Let's say
the first node is "bad" because of the load or for other reasons. All off-rack
tasks will be slowed until they fail over to the next node, which again will be
the another deterministic node, which may cause it to be considered "bad",
repeat.
Even on the same rack, I'm not sure there's a benefit to deterministically
picking the same node. There's generally only 2 replicas per rack. If a large
number of rack-local tasks start up then dividing the load between the two
replicas probably has better performance.
> Clients are always sent to the same datanode when read is off rack
> ------------------------------------------------------------------
>
> Key: HDFS-6840
> URL: https://issues.apache.org/jira/browse/HDFS-6840
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.5.0
> Reporter: Jason Lowe
> Priority: Critical
>
> After HDFS-6268 the sorting order of block locations is deterministic for a
> given block and locality level (e.g.: local, rack. off-rack), so off-rack
> clients all see the same datanode for the same block. This leads to very
> poor behavior in distributed cache localization and other scenarios where
> many clients all want the same block data at approximately the same time.
> The one datanode is crushed by the load while the other replicas only handle
> local and rack-local requests.
--
This message was sent by Atlassian JIRA
(v6.2#6252)