Ritesh Shukla created HDDS-11412:
------------------------------------
Summary: RocksDB LRU Cache seems to be wasting a lot of CPU time
when dealing with nested FSO + heavy read load
Key: HDDS-11412
URL: https://issues.apache.org/jira/browse/HDDS-11412
Project: Apache Ozone
Issue Type: Bug
Reporter: Ritesh Shukla
Attachments: 10.deep.fso.read.shards.300.html,
image-2024-09-04-13-24-42-394.png
When trying out a heavy OM GetKeyInfo load on a deeply nested FSO directory
structure. The read performance for OM degrades by around 100 us per directory
in the path.
Sample command
{code:java}
ozone freon --set ozone.network.topology.aware.read=false ockrw -t 90 -v ritesh
-b bucket -n 10000000 -r 300 -s 0 --contiguous --percentage-read=100
--percentage-list=0 -m --prefix
adfifzctmq/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/{code}
The flame graph for the leader shows OM spinning a lot of cycles on the LRU
cache that is configured doing Clean up and lookup and waiting on locks.
Since this is a pure read load, I was expected the slow down coming from load
on the disk or cache but this seems to be artificial and an issue with the LRU
cache. Even with a working set of few hundred keys, the slow down is present.
Filling this to record the data capture so far and investigate further.
!image-2024-09-04-13-24-42-394.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]