Ritesh Shukla created HDDS-11412:
------------------------------------

             Summary: RocksDB LRU Cache seems to be wasting a lot of CPU time 
when dealing with nested FSO + heavy read load
                 Key: HDDS-11412
                 URL: https://issues.apache.org/jira/browse/HDDS-11412
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Ritesh Shukla
         Attachments: 10.deep.fso.read.shards.300.html, 
image-2024-09-04-13-24-42-394.png

When trying out a heavy OM GetKeyInfo load on a deeply nested FSO directory 
structure. The read performance for OM degrades by around 100 us per directory 
in the path. 

Sample command
{code:java}
ozone freon --set ozone.network.topology.aware.read=false ockrw -t 90 -v ritesh 
-b bucket -n 10000000 -r 300 -s 0 --contiguous --percentage-read=100 
--percentage-list=0 -m --prefix 
adfifzctmq/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/{code}
The flame graph for the leader shows OM spinning a lot of cycles on the LRU 
cache that is configured doing Clean up and lookup and waiting on locks.

Since this is a pure read load, I was expected the slow down coming from load 
on the disk or cache but this seems to be artificial and an issue with the LRU 
cache. Even with a working set of few hundred keys, the slow down is present.

Filling this to record the data capture so far and investigate further.

!image-2024-09-04-13-24-42-394.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to