ConfX created HDFS-17098:
----------------------------

             Summary: DatanodeManager does not handle null storage type properly
                 Key: HDFS-17098
                 URL: https://issues.apache.org/jira/browse/HDFS-17098
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: ConfX
         Attachments: reproduce.sh

h2. What happened:

Got a {{NullPointerException}} without message when sorting datanodes in 
{{{}NetworkTopology{}}}.
h2. Where's the bug:

In line 654 of {{{}DatanodeManager{}}}, the manager creates a second sorter 
using the standard {{Comparator}} class:
{noformat}
Comparator<DatanodeInfoWithStorage> comp =
        Comparator.comparing(DatanodeInfoWithStorage::getStorageType);
secondarySort = list -> Collections.sort(list, comp);{noformat}
This comparator is then used in {{NetworkTopology}} as a secondary sort to 
break ties:
{noformat}
if (secondarySort != null) {
        // a secondary sort breaks the tie between nodes.
        secondarySort.accept(nodesList);
}{noformat}
However, if the storage type is {{{}null{}}}, a {{NullPointerException}} would 
be thrown since the default {{Comparator.comparing}} cannot handle comparison 
between null values.
h2. How to reproduce:

(1) Set {{dfs.heartbeat.interval}} to {{{}1753310367{}}}, and 
{{dfs.namenode.read.considerStorageType}} to {{true}}
(2) Run test: 
{{org.apache.hadoop.hdfs.server.blockmanagement.TestSortLocatedBlock#testAviodStaleAndSlowDatanodes}}
h2. Stacktrace:
{noformat}
java.lang.NullPointerException
    at 
java.base/java.util.Comparator.lambda$comparing$77a9974f$1(Comparator.java:469)
    at java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
    at java.base/java.util.TimSort.sort(TimSort.java:220)
    at java.base/java.util.Arrays.sort(Arrays.java:1515)
    at java.base/java.util.ArrayList.sort(ArrayList.java:1750)
    at java.base/java.util.Collections.sort(Collections.java:179)
    at 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.lambda$createSecondaryNodeSorter$0(DatanodeManager.java:654)
    at 
org.apache.hadoop.net.NetworkTopology.sortByDistance(NetworkTopology.java:983)
    at 
org.apache.hadoop.net.NetworkTopology.sortByDistanceUsingNetworkLocation(NetworkTopology.java:946)
    at 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlock(DatanodeManager.java:637)
    at 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:554)
    at 
org.apache.hadoop.hdfs.server.blockmanagement.TestSortLocatedBlock.testAviodStaleAndSlowDatanodes(TestSortLocatedBlock.java:144){noformat}
For an easy reproduction, run the reproduce.sh in the attachment. We are happy 
to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to