[
https://issues.apache.org/jira/browse/HDDS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prashant Pogde updated HDDS-3107:
---------------------------------
Target Version/s: 1.2.0
I am managing the 1.1.0 release and we currently have more than 600 issues
targeted for 1.1.0. I am moving the target field to 1.2.0.
If you are actively working on this jira and believe this should be targeted to
1.1.0 release, Please change the target field back to 1.1.0 before Feb 05,
2021.
> Pipelines may not be rack aware on cluster startup
> --------------------------------------------------
>
> Key: HDDS-3107
> URL: https://issues.apache.org/jira/browse/HDDS-3107
> Project: Hadoop Distributed Data Store
> Issue Type: Sub-task
> Components: SCM
> Affects Versions: 1.0.0
> Reporter: Stephen O'Donnell
> Priority: Major
> Attachments: docker-ozone-topology-ozone-topology-readdata-scm.log
>
>
> Given a 6 node cluster with 2 racks so there are 3 nodes per rack, it is
> possible for the pipeline to be created in a non-rack-aware way on startup.
> Using a robot test, like the one in HDDS-3084 intermittently I can see that
> if all nodes from one rack get registered first, a pipeline creation is
> triggered on them resulting in a pipeline which is all on one rack. Then the
> next 3 nodes register and as there are no nodes available on the other rack,
> they too join a "one rack" pipeline.
> This log snippet shows this happening. I will attach the full docker-compose
> log:
> {code}
> egrep "Sending CreatePipelineCommand|Registered Data node|Created pipe"
> docker-ozone-topology-ozone-topology-readdata-scm.log
> scm_1 | 2020-02-28 12:27:57,826 [IPC Server handler 6 on 9861] INFO
> node.SCMNodeManager: Registered Data node :
> 74084fe6-60a9-45d6-b02c-a9fa7ed24e3a{ip: 10.5.0.6, host:
> ozone-topology_datanode_3_1.ozone-topology_net, networkLocation: /rack1,
> certSerialId: null}
> scm_1 | 2020-02-28 12:27:57,840 [IPC Server handler 9 on 9861] INFO
> node.SCMNodeManager: Registered Data node :
> 32be7fa9-1ff6-4bb3-8bed-8648d276ae07{ip: 10.5.0.5, host:
> ozone-topology_datanode_2_1.ozone-topology_net, networkLocation: /rack1,
> certSerialId: null}
> scm_1 | 2020-02-28 12:27:57,903 [RatisPipelineUtilsThread] INFO
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
> pipeline:PipelineID=16806a56-8e35-46b2-aefd-cb5232d6f5f7 to
> datanode:32be7fa9-1ff6-4bb3-8bed-8648d276ae07
> scm_1 | 2020-02-28 12:27:57,924 [RatisPipelineUtilsThread] INFO
> pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
> 16806a56-8e35-46b2-aefd-cb5232d6f5f7, Nodes:
> 32be7fa9-1ff6-4bb3-8bed-8648d276ae07{ip: 10.5.0.5, host:
> ozone-topology_datanode_2_1.ozone-topology_net, networkLocation: /rack1,
> certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null,
> CreationTimestamp2020-02-28T12:27:57.891553Z]
> scm_1 | 2020-02-28 12:27:57,932 [RatisPipelineUtilsThread] INFO
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
> pipeline:PipelineID=5a3edf1e-84f6-48ef-a333-6f3e924898a6 to
> datanode:74084fe6-60a9-45d6-b02c-a9fa7ed24e3a
> scm_1 | 2020-02-28 12:27:57,933 [RatisPipelineUtilsThread] INFO
> pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
> 5a3edf1e-84f6-48ef-a333-6f3e924898a6, Nodes:
> 74084fe6-60a9-45d6-b02c-a9fa7ed24e3a{ip: 10.5.0.6, host:
> ozone-topology_datanode_3_1.ozone-topology_net, networkLocation: /rack1,
> certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null,
> CreationTimestamp2020-02-28T12:27:57.932422Z]
> scm_1 | 2020-02-28 12:27:58,213 [IPC Server handler 8 on 9861] INFO
> node.SCMNodeManager: Registered Data node :
> 4ce489a3-e3da-4f2a-9ddc-b01b634a68b6{ip: 10.5.0.4, host:
> ozone-topology_datanode_1_1.ozone-topology_net, networkLocation: /rack1,
> certSerialId: null}
> scm_1 | 2020-02-28 12:27:58,216 [RatisPipelineUtilsThread] INFO
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
> pipeline:PipelineID=ba2034fc-cb11-482a-9843-435294862240 to
> datanode:4ce489a3-e3da-4f2a-9ddc-b01b634a68b6
> scm_1 | 2020-02-28 12:27:58,216 [RatisPipelineUtilsThread] INFO
> pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
> ba2034fc-cb11-482a-9843-435294862240, Nodes:
> 4ce489a3-e3da-4f2a-9ddc-b01b634a68b6{ip: 10.5.0.4, host:
> ozone-topology_datanode_1_1.ozone-topology_net, networkLocation: /rack1,
> certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null,
> CreationTimestamp2020-02-28T12:27:58.216275Z]
> scm_1 | 2020-02-28 12:27:58,218 [RatisPipelineUtilsThread] INFO
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
> pipeline:PipelineID=4f16913d-ec06-44b4-a577-6664a517e401 to
> datanode:4ce489a3-e3da-4f2a-9ddc-b01b634a68b6
> scm_1 | 2020-02-28 12:27:58,219 [RatisPipelineUtilsThread] INFO
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
> pipeline:PipelineID=4f16913d-ec06-44b4-a577-6664a517e401 to
> datanode:74084fe6-60a9-45d6-b02c-a9fa7ed24e3a
> scm_1 | 2020-02-28 12:27:58,220 [RatisPipelineUtilsThread] INFO
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
> pipeline:PipelineID=4f16913d-ec06-44b4-a577-6664a517e401 to
> datanode:32be7fa9-1ff6-4bb3-8bed-8648d276ae07
> scm_1 | 2020-02-28 12:27:58,221 [RatisPipelineUtilsThread] INFO
> pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
> 4f16913d-ec06-44b4-a577-6664a517e401, Nodes:
> 4ce489a3-e3da-4f2a-9ddc-b01b634a68b6{ip: 10.5.0.4, host:
> ozone-topology_datanode_1_1.ozone-topology_net, networkLocation: /rack1,
> certSerialId: null}74084fe6-60a9-45d6-b02c-a9fa7ed24e3a{ip: 10.5.0.6, host:
> ozone-topology_datanode_3_1.ozone-topology_net, networkLocation: /rack1,
> certSerialId: null}32be7fa9-1ff6-4bb3-8bed-8648d276ae07{ip: 10.5.0.5, host:
> ozone-topology_datanode_2_1.ozone-topology_net, networkLocation: /rack1,
> certSerialId: null}, Type:RATIS, Factor:THREE, State:ALLOCATED,
> leaderId:null, CreationTimestamp2020-02-28T12:27:58.218896Z]
> scm_1 | 2020-02-28 12:27:58,645 [IPC Server handler 7 on 9861] INFO
> node.SCMNodeManager: Registered Data node :
> 66ec72b2-4be5-453f-ac44-cc9857bad5f0{ip: 10.5.0.8, host:
> ozone-topology_datanode_5_1.ozone-topology_net, networkLocation: /rack2,
> certSerialId: null}
> scm_1 | 2020-02-28 12:27:58,645 [RatisPipelineUtilsThread] INFO
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
> pipeline:PipelineID=4739840f-8bb3-4742-ac5e-ac519b51e0fd to
> datanode:66ec72b2-4be5-453f-ac44-cc9857bad5f0
> scm_1 | 2020-02-28 12:27:58,647 [RatisPipelineUtilsThread] INFO
> pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
> 4739840f-8bb3-4742-ac5e-ac519b51e0fd, Nodes:
> 66ec72b2-4be5-453f-ac44-cc9857bad5f0{ip: 10.5.0.8, host:
> ozone-topology_datanode_5_1.ozone-topology_net, networkLocation: /rack2,
> certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null,
> CreationTimestamp2020-02-28T12:27:58.645455Z]
> scm_1 | 2020-02-28 12:27:59,339 [IPC Server handler 7 on 9861] INFO
> node.SCMNodeManager: Registered Data node :
> 9be38eea-bacc-434a-876d-50b105d4daa2{ip: 10.5.0.9, host:
> ozone-topology_datanode_6_1.ozone-topology_net, networkLocation: /rack2,
> certSerialId: null}
> scm_1 | 2020-02-28 12:27:59,340 [RatisPipelineUtilsThread] INFO
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
> pipeline:PipelineID=555b9a1d-1c4a-4d9f-b198-492da7005ccd to
> datanode:9be38eea-bacc-434a-876d-50b105d4daa2
> scm_1 | 2020-02-28 12:27:59,341 [RatisPipelineUtilsThread] INFO
> pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
> 555b9a1d-1c4a-4d9f-b198-492da7005ccd, Nodes:
> 9be38eea-bacc-434a-876d-50b105d4daa2{ip: 10.5.0.9, host:
> ozone-topology_datanode_6_1.ozone-topology_net, networkLocation: /rack2,
> certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null,
> CreationTimestamp2020-02-28T12:27:59.340193Z]
> scm_1 | 2020-02-28 12:27:59,672 [IPC Server handler 6 on 9861] INFO
> node.SCMNodeManager: Registered Data node :
> cc1827a2-e4d2-47b4-a13a-1d990c6e36e1{ip: 10.5.0.7, host:
> ozone-topology_datanode_4_1.ozone-topology_net, networkLocation: /rack2,
> certSerialId: null}
> scm_1 | 2020-02-28 12:27:59,673 [RatisPipelineUtilsThread] INFO
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
> pipeline:PipelineID=a6d77ef7-52c0-4f6a-8c22-f0b405da08a1 to
> datanode:cc1827a2-e4d2-47b4-a13a-1d990c6e36e1
> scm_1 | 2020-02-28 12:27:59,674 [RatisPipelineUtilsThread] INFO
> pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
> a6d77ef7-52c0-4f6a-8c22-f0b405da08a1, Nodes:
> cc1827a2-e4d2-47b4-a13a-1d990c6e36e1{ip: 10.5.0.7, host:
> ozone-topology_datanode_4_1.ozone-topology_net, networkLocation: /rack2,
> certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null,
> CreationTimestamp2020-02-28T12:27:59.673585Z]
> scm_1 | 2020-02-28 12:27:59,683 [RatisPipelineUtilsThread] INFO
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
> pipeline:PipelineID=70cfd35d-b778-42df-bcba-3ba14bd8ead0 to
> datanode:9be38eea-bacc-434a-876d-50b105d4daa2
> scm_1 | 2020-02-28 12:27:59,683 [RatisPipelineUtilsThread] INFO
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
> pipeline:PipelineID=70cfd35d-b778-42df-bcba-3ba14bd8ead0 to
> datanode:66ec72b2-4be5-453f-ac44-cc9857bad5f0
> scm_1 | 2020-02-28 12:27:59,683 [RatisPipelineUtilsThread] INFO
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for
> pipeline:PipelineID=70cfd35d-b778-42df-bcba-3ba14bd8ead0 to
> datanode:cc1827a2-e4d2-47b4-a13a-1d990c6e36e1
> scm_1 | 2020-02-28 12:27:59,684 [RatisPipelineUtilsThread] INFO
> pipeline.PipelineStateManager: Created pipeline Pipeline[ Id:
> 70cfd35d-b778-42df-bcba-3ba14bd8ead0, Nodes:
> 9be38eea-bacc-434a-876d-50b105d4daa2{ip: 10.5.0.9, host:
> ozone-topology_datanode_6_1.ozone-topology_net, networkLocation: /rack2,
> certSerialId: null}66ec72b2-4be5-453f-ac44-cc9857bad
> {code}
> I believe there are a few things to consider here:
> 1) Do we need a better way to see if rack awareness is enabled? Currently we
> check the network topology for a count of rack nodes, but these are only
> created as the nodes register. Should we use the cluster map to determine the
> intended number of racks on the cluster?
> 2) Should we fallback to non-rack-aware so easily? Pipelines are long lived,
> and if they are created non-rack aware, they will stay that way potential
> forever. Maybe we need to delay pipeline creation on startup until the node
> count settles?
> 3) If a pipeline or new container is being placed non-rack aware in a rack
> aware cluster should we complain loudly in the logs, JMX, in Recon?
> 4) Do we need something to check for non-rack aware pipelines and fix them if
> it can? Eg if we have 2 racks, and stop 1 rack, then we must create a
> non-rack-aware pipeline to keep on writing, but when the other rack is
> restarted, that pipeline should be destroyed and a new rack-aware one created.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]