[ 
https://issues.apache.org/jira/browse/HDFS-6268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065966#comment-14065966
 ] 

Ashwin Shankar commented on HDFS-6268:
--------------------------------------

Hi [~andrew.wang],
You're correct, this behavior is not a regression, we saw this issue before 
applying your patch too.

bq. The DFSClient should also be failing over to some other replica after a 
timeout, so I'm surprised your containers are getting stuck.
We run our clusters on Amazon AWS, and they don't differentiate between 
rack_local and off_switch nodes. offswitch_nodes are considered rack_local as 
well.For big jobs, when containers go into their LOCALIZING phase, in which 
they download resources from hdfs, an offswitch datanode which is treated as 
racklocal gets bombarded by hundreds of tasks. Sometimes the size of resources 
to be downloaded is large(hashtable in a hive map join) and when an offswitch 
node gets hit by hundreds of tasks, containers takes more than 10 mins to 
download,by which time AM times them out and kills them.

bq. Anyway, if you want to add a new config to not use a seed (default false), 
I'd be happy to review.
 Thanks Andrew ! I've done that and posted a patch in HDFS-6701. This is a 
little urgent, it would be very helpful if we can get it reviewed and committed 
quickly.



> Better sorting in NetworkTopology#pseudoSortByDistance when no local node is 
> found
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-6268
>                 URL: https://issues.apache.org/jira/browse/HDFS-6268
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.4.0
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>            Priority: Minor
>             Fix For: 3.0.0
>
>         Attachments: hdfs-6268-1.patch, hdfs-6268-2.patch, hdfs-6268-3.patch, 
> hdfs-6268-4.patch, hdfs-6268-5.patch, hdfs-6268-branch-2.001.patch
>
>
> In NetworkTopology#pseudoSortByDistance, if no local node is found, it will 
> always place the first rack local node in the list in front.
> This became an issue when a dataset was loaded from a single datanode. This 
> datanode ended up being the first replica for all the blocks in the dataset. 
> When running an Impala query, the non-local reads when reading past a block 
> boundary were all hitting this node, meaning massive load skew.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to