[ 
https://issues.apache.org/jira/browse/HDDS-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duong updated HDDS-8765:
------------------------
    Labels: ozone-performance  (was: )

> OM (read) lock performance improvement
> --------------------------------------
>
>                 Key: HDDS-8765
>                 URL: https://issues.apache.org/jira/browse/HDDS-8765
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Duong
>            Priority: Major
>              Labels: ozone-performance
>         Attachments: Screenshot 2023-06-05 at 4.31.14 PM.png, 
> image-2023-06-05-22-49-47-147.png, latency-after.png, latency-before.png
>
>
> h2. Problem
> Today, OM manages resource (volume, bucket) locks by LockManager, a component 
> enclosing a lock table as a ConcurrentHashMap to store all active locks. 
> Locks are dynamically allocated and destroyed in the lock table based on 
> runtime needs. This means, for every lock allocated, a usage count is kept up 
> to date to decide when the lock is no longer needed. 
> The current performance of LockManager is limited by the cost of maintaining 
> individual lock liveness, aka, counting how many concurrent usages to a lock 
> and removing it from the lock table when it's no longer used.
> This cost mainly incurs from the need to *synchronize* all the concurrent 
> access to every lock (or technically, a ConcurrentHashMap's node) when:
>  # When getting the lock to obtain: create a lock object if it's not existing 
> in the table and increase the lock usage count.
>  # When releasing the lock: decrease the usage count and remove the lock when 
> the usage count is 0.
> !Screenshot 2023-06-05 at 4.31.14 PM.png|width=764,height=243!
> This synchronization is done internally inside ConcurrentHashMap's two 
> methods: _compute_ and {_}computeIfPresent{_}.
> This synchronization creates a bottleneck when multiple threads try to obtain 
> and release the same lock, even for read locks.
> h2. Experiment
> I did an experiment of pure OM key reads in the same buckets with 100 reader 
> threads. The freon command looks like the following:
> {code:java}
> ozone freon ockrw -v duong -b obs --contiguous -n 50000000 -p zerobytes 
> --percentage-read=100 -r 5000000 -s 0 --size 0 -m -t 100 {code}
> With the current code, the total OPPS tops at ~100K and getKeyInfo latency is 
> ~800μs. The time taken to get a lock for obtaining and releasing the lock is 
> ~40μs. Note that for each getKeyInfo request, OM obtains and releases volume 
> and bucket locks multiple times. 
> !image-2023-06-05-22-49-47-147.png|width=864,height=433!
> With a [quick and naive 
> change|https://github.com/duongkame/ozone/commit/91c2729cef1649f038b1560d260d41f91de4c0b2]
>  to remove the synchronization when getting and releasing the locks, the 
> getKeyInfo latency drops to ~400μs and total OPPS raises to ~160K. Time to 
> get and release lock drops to 4-5μs. Please note that this is just to 
> demonstrate the impact of the synchronization and not a practical change, as 
> synchronization is substantial to the correctness of the dynamic lock 
> management. 
> !latency-after.png|width=865,height=449!
> h2. Proposed solution
> It's been shown that dynamic lock allocation (and deallocation) per resource 
> is costly, especially for hotspots like bucket/volume level locks. 
> Synchronization is costly even for a single thread access.
> A simpler and more performant solution is lock stripping. The idea can be 
> easily explained as if we preallocate an array of locks and select the lock 
> for a resource based on its hashcode % array_size. Locks should be 
> preallocated, reused and never deallocated.
> Implementation-wise, we don't need to reinvent the wheel but use Guava's 
> [Striped.|https://guava.dev/releases/19.0/api/docs/com/google/common/util/concurrent/Striped.html]
> To minimize the chance of collision (different resources mapped to the same 
> lock), the array size (or Striped size in Guava term) can be a product of the 
> number of worker threads. A 10x factor would result in ~0% of collision.
> We may also introduce separate Striped for each resource type like volume, 
> bucket or key to avoid cross-resource-type collisions. (Note that we haven't 
> introduced key-level locks yet, but it's an important future project)
> h3. Pros
>  * Fast and scalable, emilinate the cost of dynamic allocation/deallocation 
> of locks, especially for hotspots (e.g. hot bucket level locks) or high 
> cardinality (e.g. key level locks).
>  * Simple, no synchronization complexity, no deallocation, available library. 
>  
>  * GC friendly.
> h3. Cons
>  * Introduce a risk of lock collision, yet can be eliminated by extending the 
> lock array to e.g. 10x of the number of threads (which is not expensive).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to