[ 
https://issues.apache.org/jira/browse/HDDS-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-12080:
----------------------------------
    Labels: pull-request-available  (was: )

> Clear irrelevant RATIS THREE pipelines on datanodes
> ---------------------------------------------------
>
>                 Key: HDDS-12080
>                 URL: https://issues.apache.org/jira/browse/HDDS-12080
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Datanode, SCM
>    Affects Versions: 2.0.0
>            Reporter: Vyacheslav Tutrinov
>            Assignee: Vyacheslav Tutrinov
>            Priority: Major
>              Labels: pull-request-available
>
> h3. Lifecycle of RATIS/THREE Pipelines
> The lifecycle of RATIS/THREE pipelines is fairly simple: they are created 
> automatically by the SCM (periodic invocation of 
> *{*}RatisPipelineProvider.create(...){*}*) and closed automatically (unless 
> manually closed via CLI) for several possible reasons:
>  - {*}Slow Followers{*}: If the followers in the pipeline's RAFT group are 
> slow, pipeline operations fail, and the datanode triggers pipeline closure 
> (this operation is sent to the SCM along with a heartbeat, and the SCM then 
> deletes the pipeline on its side).
>  - {*}Stuck in ALLOCATED State{*}: If a pipeline created by the SCM remains 
> in the ALLOCATED state for too long (e.g., datanodes do not retrieve pipeline 
> creation tasks from the SCM during heartbeats), the SCM triggers a task to 
> close the pipeline. This is handled by the *{*}PipelineManagerImpl{*}* class 
> in the *{*}scrubPipelines{*}* method.
>  - {*}Missing Heartbeats{*}: If a datanode in a pipeline's group stops 
> sending heartbeats for a prolonged period, the SCM marks it as STALE and 
> begins its finalization, which involves closing the pipelines in which the 
> node participates.
> The last case is the most intriguing of the three. Let's examine it closely.
> ----
> h3. HEALTHY->STALE->DEAD Datanode and Its Pipelines
> Here's an interesting scenario:
> 1. Assume there are *N* RATIS/THREE pipelines involving the datanode. Let 
> {*}N=32{*}.
> 2. The datanode is registered with the SCM, and the SCM periodically checks 
> in the *StaleNodeHandler* whether the datanode has become STALE (stopped 
> sending heartbeats).
> 3. If the datanode stops sending heartbeats, the SCM marks it as STALE and 
> initiates its finalization:
> {code:bash}
> 2024-12-19 17:04:09,383 INFO  node.StaleNodeHandler 
> (StaleNodeHandler.java:onMessage(57)) - Datanode 
> af19f8cd-65fb-465d-bccf-310da1d8acc4(test1.ozone.test/127.0.0.1) moved to 
> stale state. Finalizing its pipelines 
> [PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e, 
> PipelineID=6358cd69-531e-49ea-9cd6-6c505f1b9bd8, 
> PipelineID=e6b6854c-d4d9-4c93-9b46-82ec5e0fed95, 
> PipelineID=7aeb0b55-6ed1-4b68-b316-10bb1f976cff, 
> PipelineID=cbca39b9-3fbb-4167-8b90-33960460c700, 
> PipelineID=eb950f17-b001-4679-9fba-39a590c0f72f, 
> PipelineID=512bacdb-ef2e-4cb6-aa98-05cf6be77049, 
> PipelineID=fc5d1a92-6075-486a-a56e-c5521256afe8, 
> PipelineID=06e1d2bc-b84e-455f-b8bd-f115c9cdde7d, 
> PipelineID=a30c7d15-25dd-4599-a4d6-ad52a65d0010, 
> PipelineID=7fef5cd4-dca3-4268-b835-fc7437d40372, 
> PipelineID=c8982830-2d8a-40c5-92d3-f07cd1509e80, 
> PipelineID=f13100dc-1cb2-42e2-9850-6139f47d7c71, 
> PipelineID=a7cfcb55-bbd1-4744-a274-3643ad61b205, 
> PipelineID=224ff311-b86b-457d-8542-0c513ba5a4a1, 
> PipelineID=2824ca06-52dd-4c2c-a8ba-da8c0aedb3e9, 
> PipelineID=751ba40f-62cb-4c76-b68f-ad21a68f3aad, 
> PipelineID=18f6122b-2b54-4a9d-8ee8-f1618f2cce58, 
> PipelineID=7a50de7e-ba95-4d3c-9f30-4bc00de9aa32, 
> PipelineID=eeb3e427-1d47-4677-8ff5-7e9671b932d2, 
> PipelineID=31ac2d9f-ece6-40fc-9e5f-79e7ef28b794, 
> PipelineID=871f6a99-3c5a-4283-be36-6df575d41fa6, 
> PipelineID=b148ce0d-3f14-4f70-a583-ba896a613024, 
> PipelineID=d71fe274-56bb-46ae-b017-a2e9386c8843, 
> PipelineID=59439d88-ba41-4637-bf88-2e46e7cad889, 
> PipelineID=53b3bcbe-ab75-4bc0-aeff-900db9029a79, 
> PipelineID=7a940c61-eece-4894-bb7b-9b891c9e1eb0, 
> PipelineID=fa452b0e-f43f-4190-a23c-5f2a088247c3, 
> PipelineID=a24a8f8b-f7f4-4013-8a28-bed4437b6359, 
> PipelineID=33b42053-b4e4-461b-90ab-f941ae263aef, 
> PipelineID=b3eb6560-818c-48ac-a334-a552cfe57a56, 
> PipelineID=ac385fed-0c9c-416c-8808-7eff784d6220]
> {code}
> 4. Finalization involves creating a batch of commands for the datanode to 
> close and delete its 32 pipelines:
> {code:bash}
> 2024-12-19 17:08:09,432 INFO  pipeline.PipelineManagerImpl 
> (PipelineManagerImpl.java:scrubPipelines(610)) - Scrubbing pipeline: id: 
> PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e since it stays at CLOSED 
> stage.
> 2024-12-19 17:08:09,436 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$close$4(270)) - Send 
> pipeline:PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e close command to 
> datanode b602aa44-832e-45f6-8bc7-18b2c0ca747b
> 2024-12-19 17:08:09,442 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$close$4(270)) - Send 
> pipeline:PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e close command to 
> datanode af19f8cd-65fb-465d-bccf-310da1d8acc4
> 2024-12-19 17:08:09,443 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$close$4(270)) - Send 
> pipeline:PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e close command to 
> datanode 4a0d6707-e72e-4454-b0e3-b4ede5f8fcee
> 2024-12-19 17:08:09,450 INFO  pipeline.PipelineManagerImpl 
> (PipelineManagerImpl.java:removePipeline(459)) - Pipeline Pipeline[ Id: 
> 29ffde4a-f0de-48e2-8ab5-80ef832dbc3e, Nodes: 
> b602aa44-832e-45f6-8bc7-18b2c0ca747b(test1.ozone.test/127.0.0.1) 
> ReplicaIndex: 
> 0af19f8cd-65fb-465d-bccf-310da1d8acc4(test1.ozone.test/127.0.0.1) 
> ReplicaIndex: 
> 04a0d6707-e72e-4454-b0e3-b4ede5f8fcee(test1.ozone.test/127.0.0.1) 
> ReplicaIndex: 0, ReplicationConfig: RATIS/THREE, State:CLOSED, 
> leaderId:4a0d6707-e72e-4454-b0e3-b4ede5f8fcee, 
> CreationTimestamp2024-12-19T16:48:09.374+03:00[Europe/Moscow]] removed.
> 2024-12-19 17:08:09,450 INFO  pipeline.PipelineManagerImpl 
> (PipelineManagerImpl.java:scrubPipelines(610)) - Scrubbing pipeline: id: 
> PipelineID=6358cd69-531e-49ea-9cd6-6c505f1b9bd8 since it stays at CLOSED 
> stage.
> 2024-12-19 17:08:09,451 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$close$4(270)) - Send 
> pipeline:PipelineID=6358cd69-531e-49ea-9cd6-6c505f1b9bd8 close command to 
> datanode b602aa44-832e-45f6-8bc7-18b2c0ca747b
> 2024-12-19 17:08:09,451 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$close$4(270)) - Send 
> pipeline:PipelineID=6358cd69-531e-49ea-9cd6-6c505f1b9bd8 close command to 
> datanode af19f8cd-65fb-465d-bccf-310da1d8acc4
> 2024-12-19 17:08:09,451 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$close$4(270)) - Send 
> pipeline:PipelineID=6358cd69-531e-49ea-9cd6-6c505f1b9bd8 close command to 
> datanode 4a0d6707-e72e-4454-b0e3-b4ede5f8fcee
> {code}
> 5. If the datanode remains unresponsive, the SCM marks it as DEAD and clears 
> the command queue for closing its pipelines:
> {code:bash}
> 2024-12-19 17:08:51,403 INFO  node.DeadNodeHandler 
> (DeadNodeHandler.java:onMessage(108)) - Clearing command queue of size 32 for 
> DN af19f8cd-65fb-465d-bccf-310da1d8acc4(test1.ozone.test/127.0.0.1)
> {code}
> 6. If the datanode then resumes sending heartbeats, it will:
>  - Retain knowledge of the 32 pipelines that were already closed by the SCM.
>  - Receive new commands to create an additional 32 pipelines (since the SCM 
> no longer associates the old pipelines with the datanode).
> 7. This can result in the datanode managing {*}64 pipelines{*}. If such 
> interruptions occur repeatedly, the number of pipelines can skyrocket, 
> potentially overwhelming the datanode (e.g., insufficient memory during a 
> restart due to the sheer number of RAFT groups).
> This scenario illustrates how pipeline management challenges could escalate 
> and potentially destabilize the system under specific edge cases.
> So, when we are in a state that a datanode has a DEAD state on the SCM side 
> and the datanode has been restarted a bunch of raft logs (raft groups, 
> pipelines) can be deleted without trying to initiate them (they are 
> irrelevant at the moment) and save a lot of time and avoid memory consumption
> *See also:*
> [https://github.com/apache/ozone/discussions/7186]
> https://issues.apache.org/jira/browse/HDDS-11856



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to