Shilun Fan created HDDS-11461:
---------------------------------
Summary: Improve the impact of DataNode I/O
Key: HDDS-11461
URL: https://issues.apache.org/jira/browse/HDDS-11461
Project: Apache Ozone
Issue Type: Improvement
Components: Ozone Datanode
Environment: Our object storage service is built on Ozone and
currently has over 3K nodes across different clusters. Customers have high
demands for the P99 latency of our system access.
Under normal circumstances, reading 200 bytes of data might take 10ms to 20ms.
However, monitoring data sometimes shows that reading 200 bytes can take up to
500ms.
Upon investigating the issue with the DN, we find that when the machine hosting
the DN experiences high I/O wait or system load, the performance of DN access
is adversely affected.
The factors contributing to high I/O wait or system load are diverse, including
DataScanner scans, EC block recovery, or containers being in an UnderReplicated
state.
We aim to design a mechanism that allows DN to sense the system's I/O
conditions to some extent (such as high system load, high I/O wait, slow
network, or slow disk) and report this data to the SCM.
This data will be used to enhance system functionality:
When a DN detects high I/O or degraded read/write performance:
- Automatically reduce the rate of DataScanner scans.
- If a specific disk's performance deteriorates, skip that disk during data
writes.
When the SCM detects high I/O or degraded read/write performance on DNs:
- Issue commands to bypass these poorly performing DNs.
- When returning a list of DNs to clients for data reads, place the degraded
DNs at the end of the list.
Reporter: Shilun Fan
Assignee: Shilun Fan
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]