[
https://issues.apache.org/jira/browse/HDDS-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-5336:
---------------------------------
Labels: pull-request-available (was: )
> Fix datanode capacity related race condition
> --------------------------------------------
>
> Key: HDDS-5336
> URL: https://issues.apache.org/jira/browse/HDDS-5336
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ethan Rose
> Assignee: Ethan Rose
> Priority: Major
> Labels: pull-request-available
>
> After merging master into the upgrade branch in HDDS-5321, an intermittent
> failure was noticed in TestSCMNodeManager#testLayoutOnHeartbeat:
> https://github.com/apache/ozone/runs/2787582345
> The issue occurs in SCMNodeManager#register, where the node is added to the
> nodeStateManager firing the NEW_NODE event, before the node report containing
> storage information for the new node is processed. The event triggers a one
> shot run on the background pipeline creator which will read the node's
> storage information to determine if it can hold a pipeline. If the storage
> report has not yet been processed when this happens, no pipeline will be
> created to use the new node when it is registered, because the node still
> appears to have no free space.
> Relevant log lines from the test failure:
> {code}
> 2021-06-09 21:04:44,087 [Listener at 0.0.0.0/34005] INFO
> net.NetworkTopologyImpl (NetworkTopologyImpl.java:add(112)) - Added a new
> node: /default-rack/b06583c0-2c53-452b-83e4-398ff0104f72
> 2021-06-09 21:04:44,087 [RatisPipelineUtilsThread - 0] WARN
> pipeline.PipelinePlacementPolicy
> (PipelinePlacementPolicy.java:filterViableNodes(151)) - Pipeline creation
> failed due to no sufficient healthy datanodes. Required 3. Found 2.
> 2021-06-09 21:04:44,088 [EventQueue-NewNodeForNewNodeHandler] INFO
> pipeline.BackgroundPipelineCreator
> (BackgroundPipelineCreatorV2.java:notifyEventTriggered(282)) - trigger a
> one-shot run on RatisPipelineUtilsThread.
> 2021-06-09 21:04:44,088 [RatisPipelineUtilsThread - 0] INFO
> pipeline.RatisPipelineProvider
> (RatisPipelineProvider.java:lambda$create$0(170)) - Sending
> CreatePipelineCommand for
> pipeline:PipelineID=8bfba789-d337-4fed-9eb6-b1debd3d19e8 to
> datanode:b06583c0-2c53-452b-83e4-398ff0104f72
> 2021-06-09 21:04:44,089 [RatisPipelineUtilsThread - 0] INFO
> pipeline.PipelineStateManager
> (PipelineStateManagerV2Impl.java:addPipeline(101)) - Created pipeline
> Pipeline[ Id: 8bfba789-d337-4fed-9eb6-b1debd3d19e8, Nodes:
> b06583c0-2c53-452b-83e4-398ff0104f72{ip: 187.106.219.59, host:
> localhost-187.106.219.59, ports: [STANDALONE=0, RATIS=0, REST=0,
> REPLICATION=0, RATIS_ADMIN=0, RATIS_SERVER=0], networkLocation:
> /default-rack, certSerialId: null, persistedOpState: IN_SERVICE,
> persistedOpStateExpiryEpochSec: 0}, ReplicationConfig: RATIS/ONE,
> State:ALLOCATED, leaderId:, CreationTimestamp2021-06-09T21:04:44.088Z].
> 2021-06-09 21:04:44,089 [RatisPipelineUtilsThread - 0] INFO
> ha.SCMHAInvocationHandler (SCMHAInvocationHandler.java:invokeRatis(113)) -
> Invoking method public abstract void
> org.apache.hadoop.hdds.scm.pipeline.StateManager.addPipeline(org.apache.hadoop.hdds.protocol.proto.HddsProtos$Pipeline)
> throws java.io.IOException on target
> org.apache.hadoop.hdds.scm.ha.MockSCMHAManager$MockRatisServer@5bf60155, cost
> 655.117us
> 2021-06-09 21:04:44,091 [RatisPipelineUtilsThread - 0] WARN
> pipeline.PipelinePlacementPolicy
> (PipelinePlacementPolicy.java:filterViableNodes(170)) - Pipeline creation
> failed due to no sufficient healthy datanodes with enough space for even a
> single container. Required 3. Found 2. Container size 5368709120.
> 2021-06-09 21:04:44,092 [Listener at 0.0.0.0/34005] INFO node.SCMNodeManager
> (SCMNodeManager.java:register(386)) - Registered Data node :
> b06583c0-2c53-452b-83e4-398ff0104f72{ip: 187.106.219.59, host:
> localhost-187.106.219.59, ports: [STANDALONE=0, RATIS=0, REST=0,
> REPLICATION=0, RATIS_ADMIN=0, RATIS_SERVER=0], networkLocation:
> /default-rack, certSerialId: null, persistedOpState: IN_SERVICE,
> persistedOpStateExpiryEpochSec: 0}
> 2021-06-09 21:04:44,093 [Listener at 0.0.0.0/34005] INFO
> node.TestSCMNodeManager
> (TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 3 pipelines
> of type RATIS and factor ONE.
> 2021-06-09 21:04:44,093 [Listener at 0.0.0.0/34005] INFO
> node.TestSCMNodeManager
> (TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines
> of type RATIS and factor THREE.
> 2021-06-09 21:04:45,094 [Listener at 0.0.0.0/34005] INFO
> node.TestSCMNodeManager
> (TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines
> of type RATIS and factor THREE.
> 2021-06-09 21:04:46,094 [Listener at 0.0.0.0/34005] INFO
> node.TestSCMNodeManager
> (TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines
> of type RATIS and factor THREE.
> 2021-06-09 21:04:47,095 [Listener at 0.0.0.0/34005] INFO
> node.TestSCMNodeManager
> (TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines
> of type RATIS and factor THREE.
> 2021-06-09 21:04:48,096 [Listener at 0.0.0.0/34005] INFO
> node.TestSCMNodeManager
> (TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines
> of type RATIS and factor THREE.
> 2021-06-09 21:04:49,096 [Listener at 0.0.0.0/34005] INFO
> node.TestSCMNodeManager
> (TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines
> of type RATIS and factor THREE.
> 2021-06-09 21:04:50,097 [Listener at 0.0.0.0/34005] INFO
> node.TestSCMNodeManager
> (TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines
> of type RATIS and factor THREE.
> 2021-06-09 21:04:51,097 [Listener at 0.0.0.0/34005] INFO
> node.TestSCMNodeManager
> (TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines
> of type RATIS and factor THREE.
> 2021-06-09 21:04:52,098 [Listener at 0.0.0.0/34005] INFO
> node.TestSCMNodeManager
> (TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines
> of type RATIS and factor THREE.
> 2021-06-09 21:04:53,098 [Listener at 0.0.0.0/34005] INFO
> node.TestSCMNodeManager
> (TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines
> of type RATIS and factor THREE.
> 2021-06-09 21:04:54,099 [Listener at 0.0.0.0/34005] INFO
> node.TestSCMNodeManager
> (TestSCMNodeManager.java:lambda$assertPipelines$10(463)) - Found 0 pipelines
> of type RATIS and factor THREE.
> {code}
> Note that the new node is the third node registered, so we would expect a
> Ratis factor three pipeline to be created after this event. Factor one
> pipeline creation succeeds for this new node due to HDDS-5337, although this
> is not related to this test failure.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]