[jira] [Commented] (HDFS-5009) NameNode should expose storage information in the LocatedBlock

Junping Du (JIRA) Mon, 02 Sep 2013 07:23:27 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756098#comment-13756098
 ]


Junping Du commented on HDFS-5009:
----------------------------------

Hi Nicholas, Thanks for the patch. I just finish my 1st round review today with 
some initial comments below. Overall, patch looks good to me.
{code}
+  public DatanodeStorageInfo[] getDatanodeStorageInfos(
+      DatanodeID[] datanodeID, String[] storageIDs)
+          throws UnregisteredNodeException {
+    if (datanodeID.length == 0) {
+      return null;
+    }
+    final DatanodeStorageInfo[] storages = new 
DatanodeStorageInfo[datanodeID.length];
+    for(int i = 0; i < datanodeID.length; i++) {
+      final DatanodeDescriptor dd = getDatanode(datanodeID[i]);
+      storages[i] = storageIDs == null || storageIDs.length == 0?
+          dd.getRandomStorageInfo(): dd.getStorageInfo(storageIDs[i]);
+    }
+    return storages; 
+  }
{code}
I have several questions here: Do we expect storageIDs to be null or empty 
while DatanodeID[] is not empty? If so, is it proper to pick up a storage 
randomly. More serious issue may be we should check length of storageIDs and 
datanodeID to be equal as it may cause ArrayIndexOutOfBoundException if 
datanodeID.length > storageIDs' length and may also cause 
dd.getStorageInfo(storageIDs[i]) to be null as mismatch the index of DatanodeID 
and storageIDs. I think this is unexpected result. Isn't it?
{code}
+  static DatanodeInfo[] toDatanodeInfos(DatanodeStorageInfo[] storages) {
+    return toDatanodeInfos(Arrays.asList(storages));
+  }
+  static DatanodeInfo[] toDatanodeInfos(List<DatanodeStorageInfo> storages) {
+    final DatanodeInfo[] datanodes = new DatanodeInfo[storages.size()];
+    for(int i = 0; i < storages.size(); i++) {
+      datanodes[i] = storages.get(i).getDatanodeDescriptor();
+    }
+    return datanodes;
+  }
{code}
My questions is: can we do something simpler as below? 
{code}
  static DatanodeInfo[] toDatanodeInfos(DatanodeStorageInfo[] storages) {
    final DatanodeInfo[] datanodes = new DatanodeInfo[storages.length];
    for(int i = 0; i < storages.length; i++) {
      datanodes[i] = storages[i].getDatanodeDescriptor();
    }
    return datanodes;
  }
{code} 

Another thing related but probably can be addressed later is: in 
FSNameSystem.getBlockLocations(), we do sortLocatedBlocks() to sort 
LocatedBlock only according to its distance to client. Within HDFS-2832, we may 
count in storage type in sorting. So additional API may need to added to 
LocatedBlock, i.e. DatanodeInfo[] getLocations(StorageType) (now we only have 
DatanodeInfo[] getLocations() there). 
More comments may come later.
                
> NameNode should expose storage information in the LocatedBlock
> --------------------------------------------------------------
>
>                 Key: HDFS-5009
>                 URL: https://issues.apache.org/jira/browse/HDFS-5009
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Arpit Agarwal
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: h5009_20130830.patch, h5009_20130831.patch
>
>
> The storage type is optionally reported by the DataNode along with block 
> report. The NameNode should make this information available to clients via 
> LocatedBlock.
> Also, LocatedBlock is used in reportBadBlocks(..).  We should add storage ID 
> to LocatedBlock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5009) NameNode should expose storage information in the LocatedBlock

Reply via email to