Re: [jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling

Arun C Murthy Sat, 11 Aug 2012 21:31:26 -0700

Jira is down, so I'll comment here....

I'd really encourage you to put this into the DataNode and throw an 
UnsupportedOperationException rather than merely do this via a client-side 
config.


Arun

On Aug 9, 2012, at 6:39 AM, Aaron T. Myers (JIRA) wrote:

> 
>    [ 
> https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431802#comment-13431802
>  ] 
> 
> Aaron T. Myers commented on HDFS-3672:
> --------------------------------------
> 
> bq. Why is this API marked @InterfaceAudience.Public. I think we should 
> remove it and just leave InterfaceStability.Unstable
> 
> I was under the impression that all public classes needed to have an 
> @InterfaceAudience annotation, and all public classes needed to have an 
> @InterfaceStability annotation unless they're marked 
> @InterfaceAudience.Private. Am I wrong about that?
> 
> bq. Configuration to turn off this functionlity should be on the server side 
> also. Otherwise a client can just enable this functionlality without the 
> admin having control over it.
> 
> I thought about this a fair bit while reviewing the code. The conclusion that 
> I came to is that the stated reason that Arun wanted this feature disabled by 
> default was "so that people who use this understand that this isn't 
> necessarily supported." A client-side-only config seems to serve that 
> purpose. Making this config server side as well only serves to require the 
> admin enable the config and restart their cluster before some client that 
> wants to try to use this functionality can give it a shot. That seems to me 
> to be a strictly unnecessary pain for both the admin and user that doesn't 
> seem to further Arun's stated goal. For that matter, why would an admin want 
> to prevent clients from calling this API? If you insist on having a server 
> side config for this, I'd like to suggest having two separate configs: a 
> server-side one that defaults to enabled, but so that an admin may 
> consciously disable it, and a client-side config that defaults to disabled so 
> that users of this API must consciously configure their client, to support 
> Arun's stated goal of making sure people are aware that it's an experimental 
> API.
> 
>> Expose disk-location information for blocks to enable better scheduling
>> -----------------------------------------------------------------------
>> 
>>                Key: HDFS-3672
>>                URL: https://issues.apache.org/jira/browse/HDFS-3672
>>            Project: Hadoop HDFS
>>         Issue Type: Improvement
>>   Affects Versions: 2.0.0-alpha
>>           Reporter: Andrew Wang
>>           Assignee: Andrew Wang
>>        Attachments: design-doc-v1.pdf, design-doc-v2.pdf, hdfs-3672-1.patch, 
>> hdfs-3672-2.patch, hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, 
>> hdfs-3672-6.patch, hdfs-3672-7.patch, hdfs-3672-8.patch
>> 
>> 
>> Currently, HDFS exposes on which datanodes a block resides, which allows 
>> clients to make scheduling decisions for locality and load balancing. 
>> Extending this to also expose on which disk on a datanode a block resides 
>> would enable even better scheduling, on a per-disk rather than coarse 
>> per-datanode basis.
>> This API would likely look similar to Filesystem#getFileBlockLocations, but 
>> also involve a series of RPCs to the responsible datanodes to determine disk 
>> ids.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA 
> administrators: 
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: [jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling

Reply via email to