[
https://issues.apache.org/jira/browse/HDFS-12011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059841#comment-16059841
]
Ravi Prakash edited comment on HDFS-12011 at 6/22/17 8:02 PM:
--------------------------------------------------------------
Hi Chencan! Thanks for your contribution.
Technically this sounds like a good idea. Please bear in mind though that most
often HDFS and YARN are running on the same node. Even if HDFS tracked which
disks are being used accurately (I doubt it does), YARN may well be hammering
the disks you choose using this policy. Moreover, we don't know what the Linux
kernel's pagecache behavior really will be under the JVM (even when you think
your write has completed, it may just be in the pagecache and will be synced to
the block device on whenever Linux decides is a good time). Having said all of
this, given that the policy is pluggable I can't think of a reason why we
wouldn't want this policy. If you have production clusters (do you?) on which
you can refine this policy, I think it'd be a great contribution. Have you done
any performance tests that illustrate this policy is better than the other ones?
Also, its not clear to me that the reference count is a good proxy for
measuring the load on a disk.
I'll take a look at the patch, but could you please try to fix all the -1s from
HadoopQA's comment?
was (Author: raviprak):
Hi Chencan! Thanks for your contribution.
Technically this sounds like a good idea. Please bear in mind though that most
often HDFS and YARN are running on the same node. Even if HDFS tracked which
disks are being used accurately (I doubt it does), YARN may well be hammering
the disks you choose using this policy. Moreover, we don't know what the Linux
kernel's pagecache behavior really will be under the JVM (even when you think
your write has completed, it may just be in the pagecache and will be synced to
the block device on whenever Linux decides is a good time). Having said all of
this, given that the policy is pluggable I can't think of a reason why we
wouldn't want this policy. If you have production clusters (do you?) on which
you can refine this policy, I think it'd be a great contribution. Have you done
any performance tests that illustrate this policy is better than the other ones?
I'll take a look at the patch, but could you please try to fix all the -1s from
HadoopQA's comment?
> Add a new load balancing volume choosing policy
> ------------------------------------------------
>
> Key: HDFS-12011
> URL: https://issues.apache.org/jira/browse/HDFS-12011
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: chencan
> Assignee: chencan
> Attachments: HADOOP-12011.patch
>
>
> There are two types of volume choosing policies when choose a volume
> inner a datanode to write in a datablock : RoundRobinVolumeChoosingPolicy and
> AvailableSpaceVolumeChoosingPolicy.This two policies do not take into account
> the fsvolume's load. We can add a new load balancing volume choosing policy,
> using existing reference in FsVolumeImpl.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]