[ https://issues.apache.org/jira/browse/HDFS-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202341#comment-16202341 ]
Chen Liang commented on HDFS-12415: ----------------------------------- I looked in this a little bit too. What was happening seems to be that {{SCMCommonPolicy#chooseDatanodes}} calls {{nodeManager.getNodes(OzoneProtos.NodeState.HEALTHY);}}, but the returned list contains a {{null}} datanode id entry. So the {{hasEnoughSpace(d, sizeRequired)}} call on the null d will fail with NPE. And the returned list with a null entry is returned by {{SCMNodeManager#getNodes}}, where seems there is some datanode id in {{healthyNodes}} but not present in {{nodes}} map. I don't see how could a datanode id be present in {{healthyNodes}} but not in {{nodes}}, because the first thing of register is to always add that datanode to {{nodes}}, before {{healthyNodes}}. I can only think of the issue being just like [~msingh] mentioned, that it is probably due to some unexpected race condition behaviour when two register calls happen and change the HashMap {{nodes}} at the same time. So I would +1 on Mukul's change. Additionally, I ran {{TestXceiverClientManager}} several ten times with v005 patch applied. The test did not fail. > Ozone: TestXceiverClientManager and TestAllocateContainer occasionally fails > ---------------------------------------------------------------------------- > > Key: HDFS-12415 > URL: https://issues.apache.org/jira/browse/HDFS-12415 > Project: Hadoop HDFS > Issue Type: Sub-task > Affects Versions: HDFS-7240 > Reporter: Weiwei Yang > Assignee: Weiwei Yang > Attachments: HDFS-12415-HDFS-7240.001.patch, > HDFS-12415-HDFS-7240.002.patch, HDFS-12415-HDFS-7240.003.patch, > HDFS-12415-HDFS-7240.004.patch, HDFS-12415-HDFS-7240.005.patch > > > TestXceiverClientManager seems to be occasionally failing in some jenkins > jobs, > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.ozone.scm.node.SCMNodeManager.getNodeStat(SCMNodeManager.java:828) > at > org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.hasEnoughSpace(SCMCommonPolicy.java:147) > at > org.apache.hadoop.ozone.scm.container.placement.algorithms.SCMCommonPolicy.lambda$chooseDatanodes$0(SCMCommonPolicy.java:125) > {noformat} > see more from [this > report|https://builds.apache.org/job/PreCommit-HDFS-Build/21065/testReport/] -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org