[
https://issues.apache.org/jira/browse/HDFS-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756098#comment-13756098
]
Junping Du commented on HDFS-5009:
----------------------------------
Hi Nicholas, Thanks for the patch. I just finish my 1st round review today with
some initial comments below. Overall, patch looks good to me.
{code}
+ public DatanodeStorageInfo[] getDatanodeStorageInfos(
+ DatanodeID[] datanodeID, String[] storageIDs)
+ throws UnregisteredNodeException {
+ if (datanodeID.length == 0) {
+ return null;
+ }
+ final DatanodeStorageInfo[] storages = new
DatanodeStorageInfo[datanodeID.length];
+ for(int i = 0; i < datanodeID.length; i++) {
+ final DatanodeDescriptor dd = getDatanode(datanodeID[i]);
+ storages[i] = storageIDs == null || storageIDs.length == 0?
+ dd.getRandomStorageInfo(): dd.getStorageInfo(storageIDs[i]);
+ }
+ return storages;
+ }
{code}
I have several questions here: Do we expect storageIDs to be null or empty
while DatanodeID[] is not empty? If so, is it proper to pick up a storage
randomly. More serious issue may be we should check length of storageIDs and
datanodeID to be equal as it may cause ArrayIndexOutOfBoundException if
datanodeID.length > storageIDs' length and may also cause
dd.getStorageInfo(storageIDs[i]) to be null as mismatch the index of DatanodeID
and storageIDs. I think this is unexpected result. Isn't it?
{code}
+ static DatanodeInfo[] toDatanodeInfos(DatanodeStorageInfo[] storages) {
+ return toDatanodeInfos(Arrays.asList(storages));
+ }
+ static DatanodeInfo[] toDatanodeInfos(List<DatanodeStorageInfo> storages) {
+ final DatanodeInfo[] datanodes = new DatanodeInfo[storages.size()];
+ for(int i = 0; i < storages.size(); i++) {
+ datanodes[i] = storages.get(i).getDatanodeDescriptor();
+ }
+ return datanodes;
+ }
{code}
My questions is: can we do something simpler as below?
{code}
static DatanodeInfo[] toDatanodeInfos(DatanodeStorageInfo[] storages) {
final DatanodeInfo[] datanodes = new DatanodeInfo[storages.length];
for(int i = 0; i < storages.length; i++) {
datanodes[i] = storages[i].getDatanodeDescriptor();
}
return datanodes;
}
{code}
Another thing related but probably can be addressed later is: in
FSNameSystem.getBlockLocations(), we do sortLocatedBlocks() to sort
LocatedBlock only according to its distance to client. Within HDFS-2832, we may
count in storage type in sorting. So additional API may need to added to
LocatedBlock, i.e. DatanodeInfo[] getLocations(StorageType) (now we only have
DatanodeInfo[] getLocations() there).
More comments may come later.
> NameNode should expose storage information in the LocatedBlock
> --------------------------------------------------------------
>
> Key: HDFS-5009
> URL: https://issues.apache.org/jira/browse/HDFS-5009
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: 3.0.0
> Reporter: Arpit Agarwal
> Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5009_20130830.patch, h5009_20130831.patch
>
>
> The storage type is optionally reported by the DataNode along with block
> report. The NameNode should make this information available to clients via
> LocatedBlock.
> Also, LocatedBlock is used in reportBadBlocks(..). We should add storage ID
> to LocatedBlock.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira