[jira] [Comment Edited] (HDFS-12011) Add a new load balancing volume choosing policy

Ravi Prakash (JIRA) Thu, 22 Jun 2017 13:03:45 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-12011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059841#comment-16059841
 ]


Ravi Prakash edited comment on HDFS-12011 at 6/22/17 8:02 PM:
--------------------------------------------------------------

Hi Chencan! Thanks for your contribution. 

Technically this sounds like a good idea. Please bear in mind though that most 
often HDFS and YARN are running on the same node. Even if HDFS tracked which 
disks are being used accurately (I doubt it does), YARN may well be hammering 
the disks you choose using this policy. Moreover, we don't know what the Linux 
kernel's pagecache behavior really will be under the JVM (even when you think 
your write has completed, it may just be in the pagecache and will be synced to 
the block device on whenever Linux decides is a good time). Having said all of 
this, given that the policy is pluggable I can't think of a reason why we 
wouldn't want this policy. If you have production clusters (do you?) on which 
you can refine this policy, I think it'd be a great contribution. Have you done 
any performance tests that illustrate this policy is better than the other ones?
Also, its not clear to me that the reference count is a good proxy for 
measuring the load on a disk.

I'll take a look at the patch, but could you please try to fix all the -1s from 
HadoopQA's comment?



was (Author: raviprak):
Hi Chencan! Thanks for your contribution. 

Technically this sounds like a good idea. Please bear in mind though that most 
often HDFS and YARN are running on the same node. Even if HDFS tracked which 
disks are being used accurately (I doubt it does), YARN may well be hammering 
the disks you choose using this policy. Moreover, we don't know what the Linux 
kernel's pagecache behavior really will be under the JVM (even when you think 
your write has completed, it may just be in the pagecache and will be synced to 
the block device on whenever Linux decides is a good time). Having said all of 
this, given that the policy is pluggable I can't think of a reason why we 
wouldn't want this policy. If you have production clusters (do you?) on which 
you can refine this policy, I think it'd be a great contribution. Have you done 
any performance tests that illustrate this policy is better than the other ones?

I'll take a look at the patch, but could you please try to fix all the -1s from 
HadoopQA's comment?


> Add a  new load balancing volume choosing policy
> ------------------------------------------------
>
>                 Key: HDFS-12011
>                 URL: https://issues.apache.org/jira/browse/HDFS-12011
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: chencan
>            Assignee: chencan
>         Attachments: HADOOP-12011.patch
>
>
>     There  are two types of volume choosing policies when choose a volume 
> inner a datanode to write in a datablock : RoundRobinVolumeChoosingPolicy and 
> AvailableSpaceVolumeChoosingPolicy.This two policies do not take into account 
> the fsvolume's load. We can add a new load balancing volume choosing policy,  
> using existing reference in FsVolumeImpl.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-12011) Add a new load balancing volume choosing policy

Reply via email to