Ethan Rose created HDDS-5336:
--------------------------------

             Summary: Fix datanode capacity related race condition
                 Key: HDDS-5336
                 URL: https://issues.apache.org/jira/browse/HDDS-5336
             Project: Apache Ozone
          Issue Type: Sub-task
            Reporter: Ethan Rose
            Assignee: Ethan Rose


After merging master into the upgrade branch in HDDS-5321, an intermittent 
failure was noticed in TestSCMNodeManager#testLayoutOnHeartbeat: 
https://github.com/apache/ozone/runs/2787582345

The issue occurs in SCMNodeManager#register, where the node is added to the 
nodeStateManager firing the NEW_NODE event, before the node report containing 
storage information for the new node is processed. The event triggers a one 
shot run on the background pipeline creator which will read the node's storage 
information to determine if it can hold a pipeline. If the storage report has 
not yet been processed when this happens, no pipeline will be created to use 
the new node when it is registered, because the node still appears to have no 
free space.

Relevant log lines from the test failure:

{code:log}
2021-06-09 21:04:44,087 [Listener at 0.0.0.0/34005] INFO  
net.NetworkTopologyImpl (NetworkTopologyImpl.java:add(112)) - Added a new node: 
/default-rack/b06583c0-2c53-452b-83e4-398ff0104f72
2021-06-09 21:04:44,087 [RatisPipelineUtilsThread - 0] WARN  
pipeline.PipelinePlacementPolicy 
(PipelinePlacementPolicy.java:filterViableNodes(151)) - Pipeline creation 
failed due to no sufficient healthy datanodes. Required 3. Found 2.
2021-06-09 21:04:44,088 [EventQueue-NewNodeForNewNodeHandler] INFO  
pipeline.BackgroundPipelineCreator 
(BackgroundPipelineCreatorV2.java:notifyEventTriggered(282)) - trigger a 
one-shot run on RatisPipelineUtilsThread.
2021-06-09 21:04:44,088 [RatisPipelineUtilsThread - 0] INFO  
pipeline.RatisPipelineProvider 
(RatisPipelineProvider.java:lambda$create$0(170)) - Sending 
CreatePipelineCommand for 
pipeline:PipelineID=8bfba789-d337-4fed-9eb6-b1debd3d19e8 to 
datanode:b06583c0-2c53-452b-83e4-398ff0104f72
2021-06-09 21:04:44,089 [RatisPipelineUtilsThread - 0] INFO  
pipeline.PipelineStateManager 
(PipelineStateManagerV2Impl.java:addPipeline(101)) - Created pipeline Pipeline[ 
Id: 8bfba789-d337-4fed-9eb6-b1debd3d19e8, Nodes: 
b06583c0-2c53-452b-83e4-398ff0104f72{ip: 187.106.219.59, host: 
localhost-187.106.219.59, ports: [STANDALONE=0, RATIS=0, REST=0, REPLICATION=0, 
RATIS_ADMIN=0, RATIS_SERVER=0], networkLocation: /default-rack, certSerialId: 
null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, 
ReplicationConfig: RATIS/ONE, State:ALLOCATED, leaderId:, 
CreationTimestamp2021-06-09T21:04:44.088Z].
2021-06-09 21:04:44,089 [RatisPipelineUtilsThread - 0] INFO  
ha.SCMHAInvocationHandler (SCMHAInvocationHandler.java:invokeRatis(113)) - 
Invoking method public abstract void 
org.apache.hadoop.hdds.scm.pipeline.StateManager.addPipeline(org.apache.hadoop.hdds.protocol.proto.HddsProtos$Pipeline)
 throws java.io.IOException on target 
org.apache.hadoop.hdds.scm.ha.MockSCMHAManager$MockRatisServer@5bf60155, cost 
655.117us
2021-06-09 21:04:44,091 [RatisPipelineUtilsThread - 0] WARN  
pipeline.PipelinePlacementPolicy 
(PipelinePlacementPolicy.java:filterViableNodes(170)) - Pipeline creation 
failed due to no sufficient healthy datanodes with enough space for even a 
single container. Required 3. Found 2. Container size 5368709120.
2021-06-09 21:04:44,092 [Listener at 0.0.0.0/34005] INFO  node.SCMNodeManager 
(SCMNodeManager.java:register(386)) - Registered Data node : 
b06583c0-2c53-452b-83e4-398ff0104f72{ip: 187.106.219.59, host: 
localhost-187.106.219.59, ports: [STANDALONE=0, RATIS=0, REST=0, REPLICATION=0, 
RATIS_ADMIN=0, RATIS_SERVER=0], networkLocation: /default-rack, certSerialId: 
null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}
2021-06-09 21:04:44,093 [Listener at 0.0.0.0/34005] INFO  
node.TestSCMNodeManager 
(TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 3 pipelines of 
type RATIS and factor ONE.
2021-06-09 21:04:44,093 [Listener at 0.0.0.0/34005] INFO  
node.TestSCMNodeManager 
(TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines of 
type RATIS and factor THREE.
2021-06-09 21:04:45,094 [Listener at 0.0.0.0/34005] INFO  
node.TestSCMNodeManager 
(TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines of 
type RATIS and factor THREE.
2021-06-09 21:04:46,094 [Listener at 0.0.0.0/34005] INFO  
node.TestSCMNodeManager 
(TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines of 
type RATIS and factor THREE.
2021-06-09 21:04:47,095 [Listener at 0.0.0.0/34005] INFO  
node.TestSCMNodeManager 
(TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines of 
type RATIS and factor THREE.
2021-06-09 21:04:48,096 [Listener at 0.0.0.0/34005] INFO  
node.TestSCMNodeManager 
(TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines of 
type RATIS and factor THREE.
2021-06-09 21:04:49,096 [Listener at 0.0.0.0/34005] INFO  
node.TestSCMNodeManager 
(TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines of 
type RATIS and factor THREE.
2021-06-09 21:04:50,097 [Listener at 0.0.0.0/34005] INFO  
node.TestSCMNodeManager 
(TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines of 
type RATIS and factor THREE.
2021-06-09 21:04:51,097 [Listener at 0.0.0.0/34005] INFO  
node.TestSCMNodeManager 
(TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines of 
type RATIS and factor THREE.
2021-06-09 21:04:52,098 [Listener at 0.0.0.0/34005] INFO  
node.TestSCMNodeManager 
(TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines of 
type RATIS and factor THREE.
2021-06-09 21:04:53,098 [Listener at 0.0.0.0/34005] INFO  
node.TestSCMNodeManager 
(TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines of 
type RATIS and factor THREE.
2021-06-09 21:04:54,099 [Listener at 0.0.0.0/34005] INFO  
node.TestSCMNodeManager 
(TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines of 
type RATIS and factor THREE.
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to