[
https://issues.apache.org/jira/browse/HDDS-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924784#comment-17924784
]
Ivan Andika commented on HDDS-12189:
------------------------------------
For reference, TiKV also highlights some compaction related optimization on
RocksDB (https://tikv.github.io/deep-dive-tikv/key-value-engine/rocksdb.html)
> Improve RocksDB compaction in Ozone
> -----------------------------------
>
> Key: HDDS-12189
> URL: https://issues.apache.org/jira/browse/HDDS-12189
> Project: Apache Ozone
> Issue Type: Epic
> Components: Ozone Manager
> Reporter: Wei-Chiu Chuang
> Priority: Major
> Attachments: consecutive tombstone seek (1).png
>
>
> We are seeing in several large production Ozone deployments where RocksDB
> seek is becoming a performance bottleneck due to consecutive tombstones.
> We need a solution to compact RocksDB and restore performance when it
> happens, ideally, without human intervention.
> h1. Symptoms
> Users claim Ozone Manager becomes extremely slow to respond to any requests,
> leading to missing SLAs for workloads. A simple ‘ozone fs -ls’ command takes
> several minutes to return.
> Jstack shows most time is spent in RocksDB iterator seek.
> The root cause is inefficient iterator seek when there are consecutive
> tombstones due to many deletions. This is a [known
> issue|https://github.com/facebook/rocksdb/issues/10300] in the RocksDB
> community.
> !consecutive tombstone seek (1).png!
> h1. Expected behavior
> Ozone should be able to auto-compact RocksDB without human intervention, and
> performance should not cause noticeable impact afterwards.
> h1. Background context
> Having too many L0 SST files degrades performance. Therefore RocksDB compacts
> them when the number of L0 files grows to 4.
> When a RocksDB iterator seeks, it must skip tombstone keys (keys that are
> marked for deletion). Seek performance degrades when there are too many
> consecutive tombstone keys, which can happen when a large directory is
> deleted. It is possible to encounter this situation before the number of L0
> files grows to 4 and triggers a compaction.
> h1. Workaround
> Before a permanent solution is implemented, the workaround is to shutdown a
> follower OM, use RocksDB ldb tool to manually compact fileTable, restart the
> OM; repeat for the rest of OMs.
>
> I wrote up the full details as well as the propose list of improvements in a
> Google Doc. Please take a look
> [here|https://docs.google.com/document/d/1v9aE9aHPryTFRbJPDO4ECtbimkDv24RdAQ_i4sw0pxI/edit?usp=sharing].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]