[jira] [Updated] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

Mingliang Liu (JIRA) Tue, 26 Apr 2016 19:15:07 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mingliang Liu updated HDFS-10335:
---------------------------------
    Description: 
Currently the 
{{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
chooses the first matching target datanode from the candidate list. This may 
make the mover schedule a lot of task to a few of the datanodes (first several 
datanodes of the candidate list). The overall performance will suffer 
significantly from this because of the saturated network/disk usage. Specially, 
if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the scheduled move 
task will be queued on a few of the storage group, regardless of other 
available storage groups. We need an algorithm which can distribute the move 
tasks approximately even across all the candidate target storage groups.

Thanks [~szetszwo] for offline discussion.

  was:
Currently the 
{{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
chooses the first matching target datanode from the candidate list. This may 
make the mover schedule a lot of task to a few of the datanodes (first several 
datanodes of the candidate list). The overall performance will suffer 
significantly from this because of the saturated network/disk usage. Specially, 
if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the scheduled move 
task will be queued on a few of the datanodes, regardless of other available 
storage resources. We need an algorithm which can distribute the move tasks 
approximately even across all the candidate target datanodes (storages).

Thanks [~szetszwo] for offline discussion.


> Mover$Processor#chooseTarget() always chooses the first matching target 
> storage group
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-10335
>                 URL: https://issues.apache.org/jira/browse/HDFS-10335
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer & mover
>    Affects Versions: 2.8.0
>            Reporter: Mingliang Liu
>            Assignee: Mingliang Liu
>            Priority: Critical
>
> Currently the 
> {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
> chooses the first matching target datanode from the candidate list. This may 
> make the mover schedule a lot of task to a few of the datanodes (first 
> several datanodes of the candidate list). The overall performance will suffer 
> significantly from this because of the saturated network/disk usage. 
> Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the 
> scheduled move task will be queued on a few of the storage group, regardless 
> of other available storage groups. We need an algorithm which can distribute 
> the move tasks approximately even across all the candidate target storage 
> groups.
> Thanks [~szetszwo] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

Reply via email to