Wei-Chiu Chuang created HDDS-12189:
--------------------------------------
Summary: Improve RocksDB compaction in Ozone
Key: HDDS-12189
URL: https://issues.apache.org/jira/browse/HDDS-12189
Project: Apache Ozone
Issue Type: Epic
Components: Ozone Manager
Reporter: Wei-Chiu Chuang
Attachments: consecutive tombstone seek (1).png
We are seeing in several large production Ozone deployments where RocksDB seek
is becoming a performance bottleneck due to consecutive tombstones.
We need a solution to compact RocksDB and restore performance when it happens,
ideally, without human intervention.
h1. Symptoms
Users claim Ozone Manager becomes extremely slow to respond to any requests,
leading to missing SLAs for workloads. A simple ‘ozone fs -ls’ command takes
several minutes to return.
Jstack shows most time is spent in RocksDB iterator seek.
The root cause is inefficient iterator seek when there are consecutive
tombstones due to many deletions. This is a [known
issue|https://github.com/facebook/rocksdb/issues/10300] in the RocksDB
community.
!consecutive tombstone seek (1).png!
h1. Expected behavior
Ozone should be able to auto-compact RocksDB without human intervention, and
performance should not cause noticeable impact afterwards.
h1. Background context
Having too many L0 SST files degrades performance. Therefore RocksDB compacts
them when the number of L0 files grows to 4.
When a RocksDB iterator seeks, it must skip tombstone keys (keys that are
marked for deletion). Seek performance degrades when there are too many
consecutive tombstone keys, which can happen when a large directory is deleted.
It is possible to encounter this situation before the number of L0 files grows
to 4 and triggers a compaction.
h1. Workaround
Before a permanent solution is implemented, the workaround is to shutdown a
follower OM, use RocksDB ldb tool to manually compact fileTable, restart the
OM; repeat for the rest of OMs.
I wrote up the full details as well as the propose list of improvements in a
Google Doc. Please take a look
[here|https://docs.google.com/document/d/1v9aE9aHPryTFRbJPDO4ECtbimkDv24RdAQ_i4sw0pxI/edit?usp=sharing].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]