[ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430462#comment-13430462 ]
Andrew Wang commented on HDFS-3672: ----------------------------------- Thanks for the (very thorough) reviews. Addressed as recommended, except as follows: bq. In the "re-group the locatedblocks to be grouped by datanodes..." loop, it seems like instead of the if (...) check, you could just put the initialization of the LocatedBlock list inside the outer loop, before the inner loop. I think it's right as is. Potentially, you need to add a new list for every datanode replica of every LocatedBlock, thus doing it inside the double nested loop. bq. Rather than using a hard-coded 10 threads for the ThreadPoolExecutor, please make this configurable. I think it's reasonable to not document it in a *-default.xml file, since most users will never want to change this value, but if someone does find the need to do it it'd be nice to not have to recompile. Since I already had hdfs-default.xml open to add the timeout config option, I documented this one too. > Expose disk-location information for blocks to enable better scheduling > ----------------------------------------------------------------------- > > Key: HDFS-3672 > URL: https://issues.apache.org/jira/browse/HDFS-3672 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 2.0.0-alpha > Reporter: Andrew Wang > Assignee: Andrew Wang > Attachments: design-doc-v1.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, > hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, hdfs-3672-6.patch, > hdfs-3672-7.patch > > > Currently, HDFS exposes on which datanodes a block resides, which allows > clients to make scheduling decisions for locality and load balancing. > Extending this to also expose on which disk on a datanode a block resides > would enable even better scheduling, on a per-disk rather than coarse > per-datanode basis. > This API would likely look similar to Filesystem#getFileBlockLocations, but > also involve a series of RPCs to the responsible datanodes to determine disk > ids. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira