[jira] [Updated] (HDDS-12080) Clear irrelevant RATIS THREE pipelines on datanodes

Vyacheslav Tutrinov (Jira) Tue, 14 Jan 2025 03:37:29 -0800


     [ 
https://issues.apache.org/jira/browse/HDDS-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vyacheslav Tutrinov updated HDDS-12080:
---------------------------------------
    Description: 
h3. Lifecycle of RATIS/THREE Pipelines

The lifecycle of RATIS/THREE pipelines is fairly simple: they are created 
automatically by the SCM (periodic invocation of 
*{*}RatisPipelineProvider.create(...){*}*) and closed automatically (unless 
manually closed via CLI) for several possible reasons:
 - {*}Slow Followers{*}: If the followers in the pipeline's RAFT group are 
slow, pipeline operations fail, and the datanode triggers pipeline closure 
(this operation is sent to the SCM along with a heartbeat, and the SCM then 
deletes the pipeline on its side).
 - {*}Stuck in ALLOCATED State{*}: If a pipeline created by the SCM remains in 
the ALLOCATED state for too long (e.g., datanodes do not retrieve pipeline 
creation tasks from the SCM during heartbeats), the SCM triggers a task to 
close the pipeline. This is handled by the *{*}PipelineManagerImpl{*}* class in 
the *{*}scrubPipelines{*}* method.
 - {*}Missing Heartbeats{*}: If a datanode in a pipeline's group stops sending 
heartbeats for a prolonged period, the SCM marks it as STALE and begins its 
finalization, which involves closing the pipelines in which the node 
participates.

The last case is the most intriguing of the three. Let's examine it closely.
----
h3. HEALTHY->STALE->DEAD Datanode and Its Pipelines

Here's an interesting scenario:

1. Assume there are *N* RATIS/THREE pipelines involving the datanode. Let 
{*}N=32{*}.

2. The datanode is registered with the SCM, and the SCM periodically checks in 
the *StaleNodeHandler* whether the datanode has become STALE (stopped sending 
heartbeats).

3. If the datanode stops sending heartbeats, the SCM marks it as STALE and 
initiates its finalization:
{code:bash}
2024-12-19 17:04:09,383 INFO  node.StaleNodeHandler 
(StaleNodeHandler.java:onMessage(57)) - Datanode 
af19f8cd-65fb-465d-bccf-310da1d8acc4(test1.ozone.test/127.0.0.1) moved to stale 
state. Finalizing its pipelines 
[PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e, 
PipelineID=6358cd69-531e-49ea-9cd6-6c505f1b9bd8, 
PipelineID=e6b6854c-d4d9-4c93-9b46-82ec5e0fed95, 
PipelineID=7aeb0b55-6ed1-4b68-b316-10bb1f976cff, 
PipelineID=cbca39b9-3fbb-4167-8b90-33960460c700, 
PipelineID=eb950f17-b001-4679-9fba-39a590c0f72f, 
PipelineID=512bacdb-ef2e-4cb6-aa98-05cf6be77049, 
PipelineID=fc5d1a92-6075-486a-a56e-c5521256afe8, 
PipelineID=06e1d2bc-b84e-455f-b8bd-f115c9cdde7d, 
PipelineID=a30c7d15-25dd-4599-a4d6-ad52a65d0010, 
PipelineID=7fef5cd4-dca3-4268-b835-fc7437d40372, 
PipelineID=c8982830-2d8a-40c5-92d3-f07cd1509e80, 
PipelineID=f13100dc-1cb2-42e2-9850-6139f47d7c71, 
PipelineID=a7cfcb55-bbd1-4744-a274-3643ad61b205, 
PipelineID=224ff311-b86b-457d-8542-0c513ba5a4a1, 
PipelineID=2824ca06-52dd-4c2c-a8ba-da8c0aedb3e9, 
PipelineID=751ba40f-62cb-4c76-b68f-ad21a68f3aad, 
PipelineID=18f6122b-2b54-4a9d-8ee8-f1618f2cce58, 
PipelineID=7a50de7e-ba95-4d3c-9f30-4bc00de9aa32, 
PipelineID=eeb3e427-1d47-4677-8ff5-7e9671b932d2, 
PipelineID=31ac2d9f-ece6-40fc-9e5f-79e7ef28b794, 
PipelineID=871f6a99-3c5a-4283-be36-6df575d41fa6, 
PipelineID=b148ce0d-3f14-4f70-a583-ba896a613024, 
PipelineID=d71fe274-56bb-46ae-b017-a2e9386c8843, 
PipelineID=59439d88-ba41-4637-bf88-2e46e7cad889, 
PipelineID=53b3bcbe-ab75-4bc0-aeff-900db9029a79, 
PipelineID=7a940c61-eece-4894-bb7b-9b891c9e1eb0, 
PipelineID=fa452b0e-f43f-4190-a23c-5f2a088247c3, 
PipelineID=a24a8f8b-f7f4-4013-8a28-bed4437b6359, 
PipelineID=33b42053-b4e4-461b-90ab-f941ae263aef, 
PipelineID=b3eb6560-818c-48ac-a334-a552cfe57a56, 
PipelineID=ac385fed-0c9c-416c-8808-7eff784d6220]
{code}
4. Finalization involves creating a batch of commands for the datanode to close 
and delete its 32 pipelines:
{code:bash}
2024-12-19 17:08:09,432 INFO  pipeline.PipelineManagerImpl 
(PipelineManagerImpl.java:scrubPipelines(610)) - Scrubbing pipeline: id: 
PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e since it stays at CLOSED stage.
2024-12-19 17:08:09,436 INFO  pipeline.RatisPipelineProvider 
(RatisPipelineProvider.java:lambda$close$4(270)) - Send 
pipeline:PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e close command to 
datanode b602aa44-832e-45f6-8bc7-18b2c0ca747b
2024-12-19 17:08:09,442 INFO  pipeline.RatisPipelineProvider 
(RatisPipelineProvider.java:lambda$close$4(270)) - Send 
pipeline:PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e close command to 
datanode af19f8cd-65fb-465d-bccf-310da1d8acc4
2024-12-19 17:08:09,443 INFO  pipeline.RatisPipelineProvider 
(RatisPipelineProvider.java:lambda$close$4(270)) - Send 
pipeline:PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e close command to 
datanode 4a0d6707-e72e-4454-b0e3-b4ede5f8fcee
2024-12-19 17:08:09,450 INFO  pipeline.PipelineManagerImpl 
(PipelineManagerImpl.java:removePipeline(459)) - Pipeline Pipeline[ Id: 
29ffde4a-f0de-48e2-8ab5-80ef832dbc3e, Nodes: 
b602aa44-832e-45f6-8bc7-18b2c0ca747b(test1.ozone.test/127.0.0.1) ReplicaIndex: 
0af19f8cd-65fb-465d-bccf-310da1d8acc4(test1.ozone.test/127.0.0.1) ReplicaIndex: 
04a0d6707-e72e-4454-b0e3-b4ede5f8fcee(test1.ozone.test/127.0.0.1) ReplicaIndex: 
0, ReplicationConfig: RATIS/THREE, State:CLOSED, 
leaderId:4a0d6707-e72e-4454-b0e3-b4ede5f8fcee, 
CreationTimestamp2024-12-19T16:48:09.374+03:00[Europe/Moscow]] removed.
2024-12-19 17:08:09,450 INFO  pipeline.PipelineManagerImpl 
(PipelineManagerImpl.java:scrubPipelines(610)) - Scrubbing pipeline: id: 
PipelineID=6358cd69-531e-49ea-9cd6-6c505f1b9bd8 since it stays at CLOSED stage.
2024-12-19 17:08:09,451 INFO  pipeline.RatisPipelineProvider 
(RatisPipelineProvider.java:lambda$close$4(270)) - Send 
pipeline:PipelineID=6358cd69-531e-49ea-9cd6-6c505f1b9bd8 close command to 
datanode b602aa44-832e-45f6-8bc7-18b2c0ca747b
2024-12-19 17:08:09,451 INFO  pipeline.RatisPipelineProvider 
(RatisPipelineProvider.java:lambda$close$4(270)) - Send 
pipeline:PipelineID=6358cd69-531e-49ea-9cd6-6c505f1b9bd8 close command to 
datanode af19f8cd-65fb-465d-bccf-310da1d8acc4
2024-12-19 17:08:09,451 INFO  pipeline.RatisPipelineProvider 
(RatisPipelineProvider.java:lambda$close$4(270)) - Send 
pipeline:PipelineID=6358cd69-531e-49ea-9cd6-6c505f1b9bd8 close command to 
datanode 4a0d6707-e72e-4454-b0e3-b4ede5f8fcee
{code}
5. If the datanode remains unresponsive, the SCM marks it as DEAD and clears 
the command queue for closing its pipelines:
{code:bash}
2024-12-19 17:08:51,403 INFO  node.DeadNodeHandler 
(DeadNodeHandler.java:onMessage(108)) - Clearing command queue of size 32 for 
DN af19f8cd-65fb-465d-bccf-310da1d8acc4(test1.ozone.test/127.0.0.1)
{code}
6. If the datanode then resumes sending heartbeats, it will:
 - Retain knowledge of the 32 pipelines that were already closed by the SCM.
 - Receive new commands to create an additional 32 pipelines (since the SCM no 
longer associates the old pipelines with the datanode).

7. This can result in the datanode managing {*}64 pipelines{*}. If such 
interruptions occur repeatedly, the number of pipelines can skyrocket, 
potentially overwhelming the datanode (e.g., insufficient memory during a 
restart due to the sheer number of RAFT groups).

This scenario illustrates how pipeline management challenges could escalate and 
potentially destabilize the system under specific edge cases.

So, when we are in a state that a datanode has a DEAD state on the SCM side and 
the datanode has been restarted a bunch of raft logs (raft groups, pipelines) 
can be deleted without trying to initiate them (they are irrelevant at the 
moment) and save a lot of time and avoid memory consumption

*See also:*
[https://github.com/apache/ozone/discussions/7186]
https://issues.apache.org/jira/browse/HDDS-11856

  was:
### Lifecycle of RATIS/THREE Pipelines

The lifecycle of RATIS/THREE pipelines is fairly simple: they are created 
automatically by the SCM (periodic invocation of 
**RatisPipelineProvider.create(...)**) and closed automatically (unless 
manually closed via CLI) for several possible reasons:

- **Slow Followers**: If the followers in the pipeline's RAFT group are slow, 
pipeline operations fail, and the datanode triggers pipeline closure (this 
operation is sent to the SCM along with a heartbeat, and the SCM then deletes 
the pipeline on its side).
- **Stuck in ALLOCATED State**: If a pipeline created by the SCM remains in the 
ALLOCATED state for too long (e.g., datanodes do not retrieve pipeline creation 
tasks from the SCM during heartbeats), the SCM triggers a task to close the 
pipeline. This is handled by the **PipelineManagerImpl** class in the 
**scrubPipelines** method.
- **Missing Heartbeats**: If a datanode in a pipeline's group stops sending 
heartbeats for a prolonged period, the SCM marks it as STALE and begins its 
finalization, which involves closing the pipelines in which the node 
participates.

The last case is the most intriguing of the three. Let's examine it closely.

---

### HEALTHY->STALE->DEAD Datanode and Its Pipelines

Here's an interesting scenario:

1. Assume there are **N** RATIS/THREE pipelines involving the datanode. Let 
**N=32**.

2. The datanode is registered with the SCM, and the SCM periodically checks in 
the **StaleNodeHandler** whether the datanode has become STALE (stopped sending 
heartbeats).

3. If the datanode stops sending heartbeats, the SCM marks it as STALE and 
initiates its finalization:

   ```bash
   2024-12-19 17:04:09,383 INFO  node.StaleNodeHandler 
(StaleNodeHandler.java:onMessage(57)) - Datanode 
af19f8cd-65fb-465d-bccf-310da1d8acc4(test1.ozone.test/127.0.0.1) moved to stale 
state. Finalizing its pipelines 
[PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e, ...]
   ```

4. Finalization involves creating a batch of commands for the datanode to close 
and delete its 32 pipelines:

   ```bash
   2024-12-19 17:08:09,432 INFO  pipeline.PipelineManagerImpl 
(PipelineManagerImpl.java:scrubPipelines(610)) - Scrubbing pipeline: id: 
PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e since it stays at CLOSED stage.
   2024-12-19 17:08:09,450 INFO  pipeline.PipelineManagerImpl 
(PipelineManagerImpl.java:removePipeline(459)) - Pipeline Pipeline[Id: 
29ffde4a-f0de-48e2-8ab5-80ef832dbc3e ...] removed.
   ```

5. If the datanode remains unresponsive, the SCM marks it as DEAD and clears 
the command queue for closing its pipelines:

   ```bash
   2024-12-19 17:08:51,403 INFO  node.DeadNodeHandler 
(DeadNodeHandler.java:onMessage(108)) - Clearing command queue of size 32 for 
DN af19f8cd-65fb-465d-bccf-310da1d8acc4(test1.ozone.test/127.0.0.1)
   ```

6. If the datanode then resumes sending heartbeats, it will:
   - Retain knowledge of the 32 pipelines that were already closed by the SCM.
   - Receive new commands to create an additional 32 pipelines (since the SCM 
no longer associates the old pipelines with the datanode).

7. This can result in the datanode managing **64 pipelines**. If such 
interruptions occur repeatedly, the number of pipelines can skyrocket, 
potentially overwhelming the datanode (e.g., insufficient memory during a 
restart due to the sheer number of RAFT groups).

This scenario illustrates how pipeline management challenges could escalate and 
potentially destabilize the system under specific edge cases.

So, when we are in a state that a datanode has a DEAD state on the SCM side and 
the datanode has been restarted a bunch of raft logs (raft groups, pipelines) 
can be deleted without trying to initiate them (they are irrelevant at the 
moment) and save a lot of time and avoid memory consumption

*See also: *
https://github.com/apache/ozone/discussions/7186
https://issues.apache.org/jira/browse/HDDS-11856


> Clear irrelevant RATIS THREE pipelines on datanodes
> ---------------------------------------------------
>
>                 Key: HDDS-12080
>                 URL: https://issues.apache.org/jira/browse/HDDS-12080
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Datanode, SCM
>    Affects Versions: 2.0.0
>            Reporter: Vyacheslav Tutrinov
>            Assignee: Vyacheslav Tutrinov
>            Priority: Major
>
> h3. Lifecycle of RATIS/THREE Pipelines
> The lifecycle of RATIS/THREE pipelines is fairly simple: they are created 
> automatically by the SCM (periodic invocation of 
> *{*}RatisPipelineProvider.create(...){*}*) and closed automatically (unless 
> manually closed via CLI) for several possible reasons:
>  - {*}Slow Followers{*}: If the followers in the pipeline's RAFT group are 
> slow, pipeline operations fail, and the datanode triggers pipeline closure 
> (this operation is sent to the SCM along with a heartbeat, and the SCM then 
> deletes the pipeline on its side).
>  - {*}Stuck in ALLOCATED State{*}: If a pipeline created by the SCM remains 
> in the ALLOCATED state for too long (e.g., datanodes do not retrieve pipeline 
> creation tasks from the SCM during heartbeats), the SCM triggers a task to 
> close the pipeline. This is handled by the *{*}PipelineManagerImpl{*}* class 
> in the *{*}scrubPipelines{*}* method.
>  - {*}Missing Heartbeats{*}: If a datanode in a pipeline's group stops 
> sending heartbeats for a prolonged period, the SCM marks it as STALE and 
> begins its finalization, which involves closing the pipelines in which the 
> node participates.
> The last case is the most intriguing of the three. Let's examine it closely.
> ----
> h3. HEALTHY->STALE->DEAD Datanode and Its Pipelines
> Here's an interesting scenario:
> 1. Assume there are *N* RATIS/THREE pipelines involving the datanode. Let 
> {*}N=32{*}.
> 2. The datanode is registered with the SCM, and the SCM periodically checks 
> in the *StaleNodeHandler* whether the datanode has become STALE (stopped 
> sending heartbeats).
> 3. If the datanode stops sending heartbeats, the SCM marks it as STALE and 
> initiates its finalization:
> {code:bash}
> 2024-12-19 17:04:09,383 INFO  node.StaleNodeHandler 
> (StaleNodeHandler.java:onMessage(57)) - Datanode 
> af19f8cd-65fb-465d-bccf-310da1d8acc4(test1.ozone.test/127.0.0.1) moved to 
> stale state. Finalizing its pipelines 
> [PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e, 
> PipelineID=6358cd69-531e-49ea-9cd6-6c505f1b9bd8, 
> PipelineID=e6b6854c-d4d9-4c93-9b46-82ec5e0fed95, 
> PipelineID=7aeb0b55-6ed1-4b68-b316-10bb1f976cff, 
> PipelineID=cbca39b9-3fbb-4167-8b90-33960460c700, 
> PipelineID=eb950f17-b001-4679-9fba-39a590c0f72f, 
> PipelineID=512bacdb-ef2e-4cb6-aa98-05cf6be77049, 
> PipelineID=fc5d1a92-6075-486a-a56e-c5521256afe8, 
> PipelineID=06e1d2bc-b84e-455f-b8bd-f115c9cdde7d, 
> PipelineID=a30c7d15-25dd-4599-a4d6-ad52a65d0010, 
> PipelineID=7fef5cd4-dca3-4268-b835-fc7437d40372, 
> PipelineID=c8982830-2d8a-40c5-92d3-f07cd1509e80, 
> PipelineID=f13100dc-1cb2-42e2-9850-6139f47d7c71, 
> PipelineID=a7cfcb55-bbd1-4744-a274-3643ad61b205, 
> PipelineID=224ff311-b86b-457d-8542-0c513ba5a4a1, 
> PipelineID=2824ca06-52dd-4c2c-a8ba-da8c0aedb3e9, 
> PipelineID=751ba40f-62cb-4c76-b68f-ad21a68f3aad, 
> PipelineID=18f6122b-2b54-4a9d-8ee8-f1618f2cce58, 
> PipelineID=7a50de7e-ba95-4d3c-9f30-4bc00de9aa32, 
> PipelineID=eeb3e427-1d47-4677-8ff5-7e9671b932d2, 
> PipelineID=31ac2d9f-ece6-40fc-9e5f-79e7ef28b794, 
> PipelineID=871f6a99-3c5a-4283-be36-6df575d41fa6, 
> PipelineID=b148ce0d-3f14-4f70-a583-ba896a613024, 
> PipelineID=d71fe274-56bb-46ae-b017-a2e9386c8843, 
> PipelineID=59439d88-ba41-4637-bf88-2e46e7cad889, 
> PipelineID=53b3bcbe-ab75-4bc0-aeff-900db9029a79, 
> PipelineID=7a940c61-eece-4894-bb7b-9b891c9e1eb0, 
> PipelineID=fa452b0e-f43f-4190-a23c-5f2a088247c3, 
> PipelineID=a24a8f8b-f7f4-4013-8a28-bed4437b6359, 
> PipelineID=33b42053-b4e4-461b-90ab-f941ae263aef, 
> PipelineID=b3eb6560-818c-48ac-a334-a552cfe57a56, 
> PipelineID=ac385fed-0c9c-416c-8808-7eff784d6220]
> {code}
> 4. Finalization involves creating a batch of commands for the datanode to 
> close and delete its 32 pipelines:
> {code:bash}
> 2024-12-19 17:08:09,432 INFO  pipeline.PipelineManagerImpl 
> (PipelineManagerImpl.java:scrubPipelines(610)) - Scrubbing pipeline: id: 
> PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e since it stays at CLOSED 
> stage.
> 2024-12-19 17:08:09,436 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$close$4(270)) - Send 
> pipeline:PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e close command to 
> datanode b602aa44-832e-45f6-8bc7-18b2c0ca747b
> 2024-12-19 17:08:09,442 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$close$4(270)) - Send 
> pipeline:PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e close command to 
> datanode af19f8cd-65fb-465d-bccf-310da1d8acc4
> 2024-12-19 17:08:09,443 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$close$4(270)) - Send 
> pipeline:PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e close command to 
> datanode 4a0d6707-e72e-4454-b0e3-b4ede5f8fcee
> 2024-12-19 17:08:09,450 INFO  pipeline.PipelineManagerImpl 
> (PipelineManagerImpl.java:removePipeline(459)) - Pipeline Pipeline[ Id: 
> 29ffde4a-f0de-48e2-8ab5-80ef832dbc3e, Nodes: 
> b602aa44-832e-45f6-8bc7-18b2c0ca747b(test1.ozone.test/127.0.0.1) 
> ReplicaIndex: 
> 0af19f8cd-65fb-465d-bccf-310da1d8acc4(test1.ozone.test/127.0.0.1) 
> ReplicaIndex: 
> 04a0d6707-e72e-4454-b0e3-b4ede5f8fcee(test1.ozone.test/127.0.0.1) 
> ReplicaIndex: 0, ReplicationConfig: RATIS/THREE, State:CLOSED, 
> leaderId:4a0d6707-e72e-4454-b0e3-b4ede5f8fcee, 
> CreationTimestamp2024-12-19T16:48:09.374+03:00[Europe/Moscow]] removed.
> 2024-12-19 17:08:09,450 INFO  pipeline.PipelineManagerImpl 
> (PipelineManagerImpl.java:scrubPipelines(610)) - Scrubbing pipeline: id: 
> PipelineID=6358cd69-531e-49ea-9cd6-6c505f1b9bd8 since it stays at CLOSED 
> stage.
> 2024-12-19 17:08:09,451 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$close$4(270)) - Send 
> pipeline:PipelineID=6358cd69-531e-49ea-9cd6-6c505f1b9bd8 close command to 
> datanode b602aa44-832e-45f6-8bc7-18b2c0ca747b
> 2024-12-19 17:08:09,451 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$close$4(270)) - Send 
> pipeline:PipelineID=6358cd69-531e-49ea-9cd6-6c505f1b9bd8 close command to 
> datanode af19f8cd-65fb-465d-bccf-310da1d8acc4
> 2024-12-19 17:08:09,451 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$close$4(270)) - Send 
> pipeline:PipelineID=6358cd69-531e-49ea-9cd6-6c505f1b9bd8 close command to 
> datanode 4a0d6707-e72e-4454-b0e3-b4ede5f8fcee
> {code}
> 5. If the datanode remains unresponsive, the SCM marks it as DEAD and clears 
> the command queue for closing its pipelines:
> {code:bash}
> 2024-12-19 17:08:51,403 INFO  node.DeadNodeHandler 
> (DeadNodeHandler.java:onMessage(108)) - Clearing command queue of size 32 for 
> DN af19f8cd-65fb-465d-bccf-310da1d8acc4(test1.ozone.test/127.0.0.1)
> {code}
> 6. If the datanode then resumes sending heartbeats, it will:
>  - Retain knowledge of the 32 pipelines that were already closed by the SCM.
>  - Receive new commands to create an additional 32 pipelines (since the SCM 
> no longer associates the old pipelines with the datanode).
> 7. This can result in the datanode managing {*}64 pipelines{*}. If such 
> interruptions occur repeatedly, the number of pipelines can skyrocket, 
> potentially overwhelming the datanode (e.g., insufficient memory during a 
> restart due to the sheer number of RAFT groups).
> This scenario illustrates how pipeline management challenges could escalate 
> and potentially destabilize the system under specific edge cases.
> So, when we are in a state that a datanode has a DEAD state on the SCM side 
> and the datanode has been restarted a bunch of raft logs (raft groups, 
> pipelines) can be deleted without trying to initiate them (they are 
> irrelevant at the moment) and save a lot of time and avoid memory consumption
> *See also:*
> [https://github.com/apache/ozone/discussions/7186]
> https://issues.apache.org/jira/browse/HDDS-11856



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-12080) Clear irrelevant RATIS THREE pipelines on datanodes

Reply via email to