[ 
https://issues.apache.org/jira/browse/HDFS-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086451#comment-17086451
 ] 

Yang Yun commented on HDFS-15278:
---------------------------------

The balancer is possible to move some block back and make one Datanode has more 
than one blocks of same file.  I think the probability is low in large cluster, 
and it's fine in our case if only few blocks of file in one Datanode. We also 
have a background job to check the locality periodically and to disperse them 
if need.

We met this issue when some files are created by yarn job. In our case, the 
Nodemanagers are deployed same machines with Datanodes, the local Datanode is 
chosen as the first. After setrep to 1, most blocks located on same Datanode.

> After execute ‘-setrep 1’, make sure that blocks of the file are dispersed 
> across different datanodes
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15278
>                 URL: https://issues.apache.org/jira/browse/HDFS-15278
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Yang Yun
>            Assignee: Yang Yun
>            Priority: Minor
>         Attachments: HDFS-15278.001.patch, HDFS-15278.002.patch
>
>
> After execute ‘-setrep 1’, many of blocks of the file may locate on same 
> machine. Especially the file is written on one datanode machine. That causes 
> data hot spots and is hard to fix if this machine is down.
> Add a chosen history to make sure that blocks of the file are dispersed 
> across different datanodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to