[ 
https://issues.apache.org/jira/browse/HDFS-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076461#comment-17076461
 ] 

Stephen O'Donnell commented on HDFS-15255:
------------------------------------------

I added some debug to figure out what happens when both options are true for 
the comparator:

{code}
 private Consumer<List<DatanodeInfoWithStorage>> createSecondaryNodeSorter() {
    Consumer<List<DatanodeInfoWithStorage>> secondarySort =
        list -> {
           LOG.info("Running the shuffle");
           Collections.shuffle(list);
        };
    if (readConsiderStorageType) {
      LOG.info("Read consider storage set");
      Comparator<DatanodeInfoWithStorage> comp =
              Comparator.comparing(DatanodeInfoWithStorage::getStorageType);
      secondarySort = list -> {
        LOG.info("Running storage sort");
        Collections.sort(list, comp);
      };
    }

    if (readConsiderLoad) {
      LOG.info("Read consider load set");
      Comparator<DatanodeInfoWithStorage> comp =
          Comparator.comparingInt(DatanodeInfo::getXceiverCount);
      secondarySort = list -> {
        LOG.info("Running with load set");
        Collections.sort(list, comp);
      };
    }
    return secondarySort;
  }
{code}

Changing one of the unit tests and running with this extra logging shows only 
the last on set is used, which makes the two features incompatible. I think 
that is OK, it just needs to be documented in the hdfs-site.xml on both the 
parameters.

> Consider StorageType when DatanodeManager#sortLocatedBlock()
> ------------------------------------------------------------
>
>                 Key: HDFS-15255
>                 URL: https://issues.apache.org/jira/browse/HDFS-15255
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Lisheng Sun
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-15255.001.patch, HDFS-15255.002.patch
>
>
> When only one replica of a block is SDD, the others are HDD. 
> When the client reads the data, the current logic is that it considers the 
> distance between the client and the dn. I think it should also consider the 
> StorageType of the replica. Priority to return a replica of the specified 
> StorageType



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to