Wei-Chiu Chuang created HDDS-12189:
--------------------------------------

             Summary: Improve RocksDB compaction in Ozone
                 Key: HDDS-12189
                 URL: https://issues.apache.org/jira/browse/HDDS-12189
             Project: Apache Ozone
          Issue Type: Epic
          Components: Ozone Manager
            Reporter: Wei-Chiu Chuang
         Attachments: consecutive tombstone seek (1).png

We are seeing in several large production Ozone deployments where RocksDB seek 
is becoming a performance bottleneck due to consecutive tombstones.

We need a solution to compact RocksDB and restore performance when it happens, 
ideally, without human intervention.
h1. Symptoms

Users claim Ozone Manager becomes extremely slow to respond to any requests, 
leading to missing SLAs for workloads. A simple ‘ozone fs -ls’ command takes 
several minutes to return.

Jstack shows most time is spent in RocksDB iterator seek.

The root cause is inefficient iterator seek when there are consecutive 
tombstones due to many deletions. This is a [known 
issue|https://github.com/facebook/rocksdb/issues/10300] in the RocksDB 
community.

!consecutive tombstone seek (1).png!
h1. Expected behavior

Ozone should be able to auto-compact RocksDB without human intervention, and 
performance should not cause noticeable impact afterwards.
h1. Background context

Having too many L0 SST files degrades performance. Therefore RocksDB compacts 
them when the number of L0 files grows to 4.

When a RocksDB iterator seeks, it must skip tombstone keys (keys that are 
marked for deletion). Seek performance degrades when there are too many 
consecutive tombstone keys, which can happen when a large directory is deleted. 
It is possible to encounter this situation before the number of L0 files grows 
to 4 and triggers a compaction.
h1. Workaround

Before a permanent solution is implemented, the workaround is to shutdown a 
follower OM, use RocksDB ldb tool to manually compact fileTable, restart the 
OM; repeat for the rest of OMs.

 

I wrote up the full details as well as the propose list of improvements in a 
Google Doc. Please take a look 
[here|https://docs.google.com/document/d/1v9aE9aHPryTFRbJPDO4ECtbimkDv24RdAQ_i4sw0pxI/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to