sarvekshayr commented on code in PR #258:
URL: https://github.com/apache/ozone-site/pull/258#discussion_r2706810579
##########
docs/03-core-concepts/01-architecture/04-storage-container-manager.md:
##########
@@ -0,0 +1,97 @@
+---
+sidebar_label: Storage Container Manager
+---
+
+# Storage Container Manager
+
+Storage Container Manager (SCM) is the leader node of the *block space
management*.
+The main responsibility is to create and manage
[containers](../02-replication/01-storage-containers.md) which is the main
replication unit of Ozone.
+
+
+
+## Main responsibilities
+
+Storage Container Manager provides multiple critical functions for the Ozone
cluster.
+SCM acts as the cluster manager, Certificate authority, Block manager and the
Replica manager.
+
+SCM is in charge of creating an Ozone cluster. When an SCM is booted up via
`init` command, SCM creates the cluster identity and root certificates needed
for the SCM certificate authority. SCM manages the life cycle of a data node in
the cluster.
+
+1. SCM is the block manager. SCM allocates blocks and assigns them to data
nodes. Clients read and write these blocks directly.
+
+2. SCM keeps track of all the block replicas. If there is a loss of data node
or a disk, SCM detects it and instructs data nodes to make copies of the
missing blocks to ensure high availability.
+
+3. **SCM's Certificate Authority** is in
+charge of issuing identity certificates for each and every
+service in the cluster. This certificate infrastructure makes
+it easy to enable mTLS at network layer and the block
+token infrastructure depends on this certificate infrastructure.
+
+## Main components
+
+For a detailed view of Storage Container Manager this section gives a quick
overview about the provided network services and the stored persisted data.
+
+### Network services provided by Storage Container Manager
+
+- Pipelines: List/Delete/Activate/Deactivate
+ - pipelines are set of Datanodes to form replication groups
Review Comment:
nit:
```suggestion
- Pipelines are set of Datanodes to form replication groups
```
##########
docs/03-core-concepts/01-architecture/04-storage-container-manager.md:
##########
@@ -0,0 +1,97 @@
+---
+sidebar_label: Storage Container Manager
+---
+
+# Storage Container Manager
+
+Storage Container Manager (SCM) is the leader node of the *block space
management*.
+The main responsibility is to create and manage
[containers](../02-replication/01-storage-containers.md) which is the main
replication unit of Ozone.
+
+
+
+## Main responsibilities
+
+Storage Container Manager provides multiple critical functions for the Ozone
cluster.
+SCM acts as the cluster manager, Certificate authority, Block manager and the
Replica manager.
+
+SCM is in charge of creating an Ozone cluster. When an SCM is booted up via
`init` command, SCM creates the cluster identity and root certificates needed
for the SCM certificate authority. SCM manages the life cycle of a data node in
the cluster.
+
+1. SCM is the block manager. SCM allocates blocks and assigns them to data
nodes. Clients read and write these blocks directly.
+
+2. SCM keeps track of all the block replicas. If there is a loss of data node
or a disk, SCM detects it and instructs data nodes to make copies of the
missing blocks to ensure high availability.
+
+3. **SCM's Certificate Authority** is in
+charge of issuing identity certificates for each and every
+service in the cluster. This certificate infrastructure makes
+it easy to enable mTLS at network layer and the block
+token infrastructure depends on this certificate infrastructure.
+
+## Main components
+
+For a detailed view of Storage Container Manager this section gives a quick
overview about the provided network services and the stored persisted data.
+
+### Network services provided by Storage Container Manager
+
+- Pipelines: List/Delete/Activate/Deactivate
+ - pipelines are set of Datanodes to form replication groups
+ - Raft groups are planned by SCM
+- Containers: Create / List / Delete containers
+- Admin related requests
+- Safemode status/modification
+- Replication manager start / stop
+- CA authority service
+- Required by other sever components
+- Datanode HeartBeat protocol
+ - From Datanode to SCM (30 sec by default)
+ - Datanodes report the status of containers, node...
+ - SCM can add commands to the response
+
+:::note
+Client doesn't connect directly to the SCM.
+:::
+
+### Persisted state
+
+The following data is persisted in Storage Container Manager side in a
specific RocksDB directory
+
+- Pipelines
+ - Replication group of servers. Maintained to find a group for new
container/block allocations.
+- Containers
+ - Containers are the replication units. Data is required to act in case of
data under/over replicated.
+- Deleted blocks
+ - Block data is deleted in the background. Need a list to follow the
progress.
+- Valid cert
+- Used by the internal Certificate Authority to authorize other Ozone services
+
+## Safe Mode
+
+SCM (Storage Container Manager) enters safe mode on startup. This is a
protective state that allows the system to become stable before it becomes
fully operational. During safe mode, certain operations like block allocation
are restricted.
+
+### How to Exit Safe Mode
+
+There are two ways to exit safe mode:
+
+1. **Automatic Exit:** SCM will automatically exit safe mode when a set of
predefined `SafeModeExitRule`s are satisfied. These rules ensure that the
cluster is in a healthy state. The primary rules are:
+ - **`DataNodeSafeModeRule`**: Checks if a minimum number of Datanodes have
registered with the SCM. This is configured by `hdds.scm.safemode.min.datanode`
(default: `3`).
+ - **`RatisContainerSafeModeRule`**: Checks if a certain percentage of
containers with at least one replica reported are available. This is configured
by `hdds.scm.safemode.threshold.pct` (default: `0.99`).
+ - **`HealthyPipelineSafeModeRule`**: Checks if a certain percentage of
pipelines are healthy. This is configured by
`hdds.scm.safemode.healthy.pipeline.pct` (default: `0.10`).
+ - **`OneReplicaPipelineSafeModeRule`**: Checks if a certain percentage of
pipelines have at least one replica reported. This is configured by
`hdds.scm.safemode.atleast.one.node.reported.pipeline.pct` (default: `0.90`).
+ - **`ECContainerSafeModeRule`**: Checks if a certain percentage of erasure
coded block groups are healthy. This is also configured by
`hdds.scm.safemode.threshold.pct` (default: `0.99`).
+
+2. **Manual Exit:** You can force SCM to exit safe mode using the `ozone admin
safemode --force-exit` command.
+
+### Safe Mode Pre-Check
+
+There's also a "pre-check" phase. SCM will not exit safe mode until all
pre-check rules are satisfied.
+The `DataNodeSafeModeRule` is a pre-check rule.
+This means that SCM will wait for a minimum number of Datanodes to be
available before it even considers the other conditions for exiting safe mode.
+
+## Notable configurations
+
+| key | default | description |
Review Comment:
Capitalise the table column headers.
```suggestion
| Key | Default | Description |
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]