[jira] [Commented] (HDDS-12189) Improve RocksDB compaction in Ozone

Ivan Andika (Jira) Thu, 06 Feb 2025 23:02:08 -0800


    [ 
https://issues.apache.org/jira/browse/HDDS-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924784#comment-17924784
 ]


Ivan Andika commented on HDDS-12189:
------------------------------------

For reference, TiKV also highlights some compaction related optimization on 
RocksDB (https://tikv.github.io/deep-dive-tikv/key-value-engine/rocksdb.html)



> Improve RocksDB compaction in Ozone
> -----------------------------------
>
>                 Key: HDDS-12189
>                 URL: https://issues.apache.org/jira/browse/HDDS-12189
>             Project: Apache Ozone
>          Issue Type: Epic
>          Components: Ozone Manager
>            Reporter: Wei-Chiu Chuang
>            Priority: Major
>         Attachments: consecutive tombstone seek (1).png
>
>
> We are seeing in several large production Ozone deployments where RocksDB 
> seek is becoming a performance bottleneck due to consecutive tombstones.
> We need a solution to compact RocksDB and restore performance when it 
> happens, ideally, without human intervention.
> h1. Symptoms
> Users claim Ozone Manager becomes extremely slow to respond to any requests, 
> leading to missing SLAs for workloads. A simple ‘ozone fs -ls’ command takes 
> several minutes to return.
> Jstack shows most time is spent in RocksDB iterator seek.
> The root cause is inefficient iterator seek when there are consecutive 
> tombstones due to many deletions. This is a [known 
> issue|https://github.com/facebook/rocksdb/issues/10300] in the RocksDB 
> community.
> !consecutive tombstone seek (1).png!
> h1. Expected behavior
> Ozone should be able to auto-compact RocksDB without human intervention, and 
> performance should not cause noticeable impact afterwards.
> h1. Background context
> Having too many L0 SST files degrades performance. Therefore RocksDB compacts 
> them when the number of L0 files grows to 4.
> When a RocksDB iterator seeks, it must skip tombstone keys (keys that are 
> marked for deletion). Seek performance degrades when there are too many 
> consecutive tombstone keys, which can happen when a large directory is 
> deleted. It is possible to encounter this situation before the number of L0 
> files grows to 4 and triggers a compaction.
> h1. Workaround
> Before a permanent solution is implemented, the workaround is to shutdown a 
> follower OM, use RocksDB ldb tool to manually compact fileTable, restart the 
> OM; repeat for the rest of OMs.
>  
> I wrote up the full details as well as the propose list of improvements in a 
> Google Doc. Please take a look 
> [here|https://docs.google.com/document/d/1v9aE9aHPryTFRbJPDO4ECtbimkDv24RdAQ_i4sw0pxI/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-12189) Improve RocksDB compaction in Ozone

Reply via email to