[ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431601#comment-13431601 ]
Andrew Wang commented on HDFS-3672: ----------------------------------- Thanks for the design doc + code review Suresh. Doc comments I fixed. Code comments, I'd like to check a few things with you before cutting another patch. Everything else I'll fix as recommended. bq. I think DiskBlockLocation, DiskId are not generic names. What if the underlying is not local disks at all, but mounted directory from another storage? Perhaps Storage(BlockLocation|Id)? Volume(BlockLocation|Id)? I'm not entirely sure of the end-user terminology here. bq. Instead of getBLockFile, you may be better of using getReplicaInfo and use the volume from it to get the directory. This was a really good point, it's bugged. I changed it to doing comparisons on volumes, which should work as expected (no path comparisons). bq. Is this functionality expected to be used for a single file or for blocks belonging to many files? Because the current way of diskID as index in a datastructure that could change (with disk failures etc.) may not provide stable diskIds. bq. I am surprised the API that returns list of diskIds and diskIds set for each block. What is the use of first diskId list. This could have been more clearly documented in the code, I'll beef it up. I did this based on [Todd's earlier comment|#comment-13419604]; basically we pass a list of the DiskIds on the datanode (one per volume), and then a list of indexes into this list (one per block). Since the DiskId of a volume should be the same for the life of a datanode, I think this is fairly stable. Clients are going to have to deal with staleness in this disk location info, since as you noted, a block's volume can change based on configuration, failures, and normal HDFS operation. Clients that peek inside HDFS for BlockLocations via #getBlockLocations() need to be fairly sophisticated anyway, since they already deal with this kind of thing at the DN level. bq. dfs.datanode.handler.count is 3. This could push the handler count limit. Should I just bump the default (say, to 10)? I haven't done any performance testing, so I don't know if it's a problem. bq. Please leave DFSClient#getBlockLocation() unchanged. I'd love to do this, but there's currently a hacky thing going on here. The new {{#getDiskBlockLocations}} call needs {{ExtendedBlock}}s to query the datanodes. The current {{#getBlockLocations}} returns {{BlockLocation}}s, which don't have this information. That's why I changed it to instead return {{HdfsBlockLocation}} instead, which also include the required {{ExtendedBlock}}s along with the {{BlockLocation}} data. > Expose disk-location information for blocks to enable better scheduling > ----------------------------------------------------------------------- > > Key: HDFS-3672 > URL: https://issues.apache.org/jira/browse/HDFS-3672 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 2.0.0-alpha > Reporter: Andrew Wang > Assignee: Andrew Wang > Attachments: design-doc-v1.pdf, design-doc-v2.pdf, hdfs-3672-1.patch, > hdfs-3672-2.patch, hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, > hdfs-3672-6.patch, hdfs-3672-7.patch, hdfs-3672-8.patch > > > Currently, HDFS exposes on which datanodes a block resides, which allows > clients to make scheduling decisions for locality and load balancing. > Extending this to also expose on which disk on a datanode a block resides > would enable even better scheduling, on a per-disk rather than coarse > per-datanode basis. > This API would likely look similar to Filesystem#getFileBlockLocations, but > also involve a series of RPCs to the responsible datanodes to determine disk > ids. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira