Vyacheslav Tutrinov created HDDS-12080:
------------------------------------------
Summary: Clear irrelevant RATIS THREE pipelines on datanodes
Key: HDDS-12080
URL: https://issues.apache.org/jira/browse/HDDS-12080
Project: Apache Ozone
Issue Type: Improvement
Components: Ozone Datanode, SCM
Affects Versions: 2.0.0
Reporter: Vyacheslav Tutrinov
Assignee: Vyacheslav Tutrinov
### Lifecycle of RATIS/THREE Pipelines
The lifecycle of RATIS/THREE pipelines is fairly simple: they are created
automatically by the SCM (periodic invocation of
**RatisPipelineProvider.create(...)**) and closed automatically (unless
manually closed via CLI) for several possible reasons:
- **Slow Followers**: If the followers in the pipeline's RAFT group are slow,
pipeline operations fail, and the datanode triggers pipeline closure (this
operation is sent to the SCM along with a heartbeat, and the SCM then deletes
the pipeline on its side).
- **Stuck in ALLOCATED State**: If a pipeline created by the SCM remains in the
ALLOCATED state for too long (e.g., datanodes do not retrieve pipeline creation
tasks from the SCM during heartbeats), the SCM triggers a task to close the
pipeline. This is handled by the **PipelineManagerImpl** class in the
**scrubPipelines** method.
- **Missing Heartbeats**: If a datanode in a pipeline's group stops sending
heartbeats for a prolonged period, the SCM marks it as STALE and begins its
finalization, which involves closing the pipelines in which the node
participates.
The last case is the most intriguing of the three. Let's examine it closely.
---
### HEALTHY->STALE->DEAD Datanode and Its Pipelines
Here's an interesting scenario:
1. Assume there are **N** RATIS/THREE pipelines involving the datanode. Let
**N=32**.
2. The datanode is registered with the SCM, and the SCM periodically checks in
the **StaleNodeHandler** whether the datanode has become STALE (stopped sending
heartbeats).
3. If the datanode stops sending heartbeats, the SCM marks it as STALE and
initiates its finalization:
```bash
2024-12-19 17:04:09,383 INFO node.StaleNodeHandler
(StaleNodeHandler.java:onMessage(57)) - Datanode
af19f8cd-65fb-465d-bccf-310da1d8acc4(test1.ozone.test/127.0.0.1) moved to stale
state. Finalizing its pipelines
[PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e, ...]
```
4. Finalization involves creating a batch of commands for the datanode to close
and delete its 32 pipelines:
```bash
2024-12-19 17:08:09,432 INFO pipeline.PipelineManagerImpl
(PipelineManagerImpl.java:scrubPipelines(610)) - Scrubbing pipeline: id:
PipelineID=29ffde4a-f0de-48e2-8ab5-80ef832dbc3e since it stays at CLOSED stage.
2024-12-19 17:08:09,450 INFO pipeline.PipelineManagerImpl
(PipelineManagerImpl.java:removePipeline(459)) - Pipeline Pipeline[Id:
29ffde4a-f0de-48e2-8ab5-80ef832dbc3e ...] removed.
```
5. If the datanode remains unresponsive, the SCM marks it as DEAD and clears
the command queue for closing its pipelines:
```bash
2024-12-19 17:08:51,403 INFO node.DeadNodeHandler
(DeadNodeHandler.java:onMessage(108)) - Clearing command queue of size 32 for
DN af19f8cd-65fb-465d-bccf-310da1d8acc4(test1.ozone.test/127.0.0.1)
```
6. If the datanode then resumes sending heartbeats, it will:
- Retain knowledge of the 32 pipelines that were already closed by the SCM.
- Receive new commands to create an additional 32 pipelines (since the SCM
no longer associates the old pipelines with the datanode).
7. This can result in the datanode managing **64 pipelines**. If such
interruptions occur repeatedly, the number of pipelines can skyrocket,
potentially overwhelming the datanode (e.g., insufficient memory during a
restart due to the sheer number of RAFT groups).
This scenario illustrates how pipeline management challenges could escalate and
potentially destabilize the system under specific edge cases.
So, when we are in a state that a datanode has a DEAD state on the SCM side and
the datanode has been restarted a bunch of raft logs (raft groups, pipelines)
can be deleted without trying to initiate them (they are irrelevant at the
moment) and save a lot of time and avoid memory consumption
*See also: *
https://github.com/apache/ozone/discussions/7186
https://issues.apache.org/jira/browse/HDDS-11856
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]