ConfX created HDFS-17098: ---------------------------- Summary: DatanodeManager does not handle null storage type properly Key: HDFS-17098 URL: https://issues.apache.org/jira/browse/HDFS-17098 Project: Hadoop HDFS Issue Type: Bug Reporter: ConfX Attachments: reproduce.sh
h2. What happened: Got a {{NullPointerException}} without message when sorting datanodes in {{{}NetworkTopology{}}}. h2. Where's the bug: In line 654 of {{{}DatanodeManager{}}}, the manager creates a second sorter using the standard {{Comparator}} class: {noformat} Comparator<DatanodeInfoWithStorage> comp = Comparator.comparing(DatanodeInfoWithStorage::getStorageType); secondarySort = list -> Collections.sort(list, comp);{noformat} This comparator is then used in {{NetworkTopology}} as a secondary sort to break ties: {noformat} if (secondarySort != null) { // a secondary sort breaks the tie between nodes. secondarySort.accept(nodesList); }{noformat} However, if the storage type is {{{}null{}}}, a {{NullPointerException}} would be thrown since the default {{Comparator.comparing}} cannot handle comparison between null values. h2. How to reproduce: (1) Set {{dfs.heartbeat.interval}} to {{{}1753310367{}}}, and {{dfs.namenode.read.considerStorageType}} to {{true}} (2) Run test: {{org.apache.hadoop.hdfs.server.blockmanagement.TestSortLocatedBlock#testAviodStaleAndSlowDatanodes}} h2. Stacktrace: {noformat} java.lang.NullPointerException at java.base/java.util.Comparator.lambda$comparing$77a9974f$1(Comparator.java:469) at java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355) at java.base/java.util.TimSort.sort(TimSort.java:220) at java.base/java.util.Arrays.sort(Arrays.java:1515) at java.base/java.util.ArrayList.sort(ArrayList.java:1750) at java.base/java.util.Collections.sort(Collections.java:179) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.lambda$createSecondaryNodeSorter$0(DatanodeManager.java:654) at org.apache.hadoop.net.NetworkTopology.sortByDistance(NetworkTopology.java:983) at org.apache.hadoop.net.NetworkTopology.sortByDistanceUsingNetworkLocation(NetworkTopology.java:946) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlock(DatanodeManager.java:637) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:554) at org.apache.hadoop.hdfs.server.blockmanagement.TestSortLocatedBlock.testAviodStaleAndSlowDatanodes(TestSortLocatedBlock.java:144){noformat} For an easy reproduction, run the reproduce.sh in the attachment. We are happy to provide a patch if this issue is confirmed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org