[
https://issues.apache.org/jira/browse/HDDS-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siyao Meng updated HDDS-12151:
------------------------------
Description:
We observed cases where full Ozone volumes (reserved space also gone.
StorageVolumeChecker throwing) could lead to complications like container state
divergence, where a container could end up having replicas with different
contents (blocks).
Even if we had {{hdds.datanode.dir.du.reserved}} or
{{hdds.datanode.dir.du.reserved.percent}} , it doesn't seem to be fully
respected by the datanode itself because we have seen some volumes with 0 bytes
left. Plus, we couldn't control what other potential applications could have
done to the volume mount.
----
List of Ozone datanode write operations (including Ratis ones) that needs to be
checked on top of my head:
1. Ratis log append -- located under
{{dfs.container.ratis.datanode.storage.dir}} . when the mount is full, the
Ratis server would shut down (the datanode might also shut down with it)
2. WriteChunk
3. Container metadata RocksDB updates
4. StorageVolumeChecker canary -- interval controlled by
hdds.datanode.periodic.disk.check.interval.minutes
5. datanode log append (typically on a different volume tho)
6. During schema v3 container replication, metadata is dumped to an external
file -- seen in
[KeyValueContainer#packContainerToDestination|https://github.com/apache/ozone/blob/189a9fe42013e82f93becaede627e509322c38f6/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java#L1012].
Note the blocks looks to be streamed over to the destination directly.
----
We have to make sure any of those operations would handle the "disk full" / "no
space left" situation gracefully. The goal is to make sure issues like
container replica divergence won't happen again.
[~erose]
was:
We observed cases where full Ozone volumes (reserved space also gone.
StorageVolumeChecker throwing) could lead to complications like container state
divergence, where a container could end up having replicas with different
contents (blocks).
Even if we had {{hdds.datanode.dir.du.reserved}} or
{{hdds.datanode.dir.du.reserved.percent}} , it doesn't seem to be fully
respected by the datanode itself because we have seen some volumes with 0 bytes
left. Plus, we couldn't control what other potential applications could have
done to the volume mount.
----
List of Ozone datanode write operations (including Ratis ones) that needs to be
checked on top of my head:
1. Ratis log append -- located under
{{dfs.container.ratis.datanode.storage.dir}} . when the mount is full, the
Ratis server would shut down (the datanode might also shut down with it)
2. WriteChunk
3. Container metadata RocksDB updates
4. StorageVolumeChecker canary -- interval controlled by
hdds.datanode.periodic.disk.check.interval.minutes
5. datanode log append (typically on a different volume tho)
----
We have to make sure any of those operations would handle the "disk full" / "no
space left" situation gracefully. The goal is to make sure issues like
container replica divergence won't happen again.
[~erose]
> Check every single write operation on an Ozone datanode is handled correctly
> in the case of full volumes
> --------------------------------------------------------------------------------------------------------
>
> Key: HDDS-12151
> URL: https://issues.apache.org/jira/browse/HDDS-12151
> Project: Apache Ozone
> Issue Type: Task
> Reporter: Siyao Meng
> Priority: Major
>
> We observed cases where full Ozone volumes (reserved space also gone.
> StorageVolumeChecker throwing) could lead to complications like container
> state divergence, where a container could end up having replicas with
> different contents (blocks).
> Even if we had {{hdds.datanode.dir.du.reserved}} or
> {{hdds.datanode.dir.du.reserved.percent}} , it doesn't seem to be fully
> respected by the datanode itself because we have seen some volumes with 0
> bytes left. Plus, we couldn't control what other potential applications could
> have done to the volume mount.
> ----
> List of Ozone datanode write operations (including Ratis ones) that needs to
> be checked on top of my head:
> 1. Ratis log append -- located under
> {{dfs.container.ratis.datanode.storage.dir}} . when the mount is full, the
> Ratis server would shut down (the datanode might also shut down with it)
> 2. WriteChunk
> 3. Container metadata RocksDB updates
> 4. StorageVolumeChecker canary -- interval controlled by
> hdds.datanode.periodic.disk.check.interval.minutes
> 5. datanode log append (typically on a different volume tho)
> 6. During schema v3 container replication, metadata is dumped to an external
> file -- seen in
> [KeyValueContainer#packContainerToDestination|https://github.com/apache/ozone/blob/189a9fe42013e82f93becaede627e509322c38f6/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java#L1012].
> Note the blocks looks to be streamed over to the destination directly.
> ----
> We have to make sure any of those operations would handle the "disk full" /
> "no space left" situation gracefully. The goal is to make sure issues like
> container replica divergence won't happen again.
> [~erose]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]