[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling

Andrew Wang (JIRA) Tue, 07 Aug 2012 10:34:12 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430462#comment-13430462
 ]


Andrew Wang commented on HDFS-3672:
-----------------------------------

Thanks for the (very thorough) reviews. Addressed as recommended, except as 
follows:

bq. In the "re-group the locatedblocks to be grouped by datanodes..." loop, it 
seems like instead of the if (...) check, you could just put the initialization 
of the LocatedBlock list inside the outer loop, before the inner loop.

I think it's right as is. Potentially, you need to add a new list for every 
datanode replica of every LocatedBlock, thus doing it inside the double nested 
loop.

bq. Rather than using a hard-coded 10 threads for the ThreadPoolExecutor, 
please make this configurable. I think it's reasonable to not document it in a 
*-default.xml file, since most users will never want to change this value, but 
if someone does find the need to do it it'd be nice to not have to recompile.

Since I already had hdfs-default.xml open to add the timeout config option, I 
documented this one too.

                
> Expose disk-location information for blocks to enable better scheduling
> -----------------------------------------------------------------------
>
>                 Key: HDFS-3672
>                 URL: https://issues.apache.org/jira/browse/HDFS-3672
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: design-doc-v1.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, 
> hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch, 
> hdfs-3672-7.patch
>
>
> Currently, HDFS exposes on which datanodes a block resides, which allows 
> clients to make scheduling decisions for locality and load balancing. 
> Extending this to also expose on which disk on a datanode a block resides 
> would enable even better scheduling, on a per-disk rather than coarse 
> per-datanode basis.
> This API would likely look similar to Filesystem#getFileBlockLocations, but 
> also involve a series of RPCs to the responsible datanodes to determine disk 
> ids.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling

Reply via email to