[
https://issues.apache.org/jira/browse/HDDS-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duong updated HDDS-8765:
------------------------
Component/s: common
OM
> OM (read) lock performance improvement
> --------------------------------------
>
> Key: HDDS-8765
> URL: https://issues.apache.org/jira/browse/HDDS-8765
> Project: Apache Ozone
> Issue Type: Improvement
> Components: common, OM
> Reporter: Duong
> Priority: Major
> Labels: ozone-performance
> Attachments: Screenshot 2023-06-05 at 4.31.14 PM.png,
> image-2023-06-05-22-49-47-147.png, latency-after.png, latency-before.png
>
>
> h2. Problem
> Today, OM manages resource (volume, bucket) locks by LockManager, a component
> enclosing a lock table as a ConcurrentHashMap to store all active locks.
> Locks are dynamically allocated and destroyed in the lock table based on
> runtime needs. This means, for every lock allocated, a usage count is kept up
> to date to decide when the lock is no longer needed.
> The current performance of LockManager is limited by the cost of maintaining
> individual lock liveness, aka, counting how many concurrent usages to a lock
> and removing it from the lock table when it's no longer used.
> This cost mainly incurs from the need to *synchronize* all the concurrent
> access to every lock (or technically, a ConcurrentHashMap's node) when:
> # When getting the lock to obtain: create a lock object if it's not existing
> in the table and increase the lock usage count.
> # When releasing the lock: decrease the usage count and remove the lock when
> the usage count is 0.
> !Screenshot 2023-06-05 at 4.31.14 PM.png|width=764,height=243!
> This synchronization is done internally inside ConcurrentHashMap's two
> methods: _compute_ and {_}computeIfPresent{_}.
> This synchronization creates a bottleneck when multiple threads try to obtain
> and release the same lock, even for read locks.
> h2. Experiment
> I did an experiment of pure OM key reads in the same buckets with 100 reader
> threads. The freon command looks like the following:
> {code:java}
> ozone freon ockrw -v duong -b obs --contiguous -n 50000000 -p zerobytes
> --percentage-read=100 -r 5000000 -s 0 --size 0 -m -t 100 {code}
> With the current code, the total OPPS tops at ~100K and getKeyInfo latency is
> ~800μs. The time taken to get a lock for obtaining and releasing the lock is
> ~40μs. Note that for each getKeyInfo request, OM obtains and releases volume
> and bucket locks multiple times.
> !image-2023-06-05-22-49-47-147.png|width=864,height=433!
> With a [quick and naive
> change|https://github.com/duongkame/ozone/commit/91c2729cef1649f038b1560d260d41f91de4c0b2]
> to remove the synchronization when getting and releasing the locks, the
> getKeyInfo latency drops to ~400μs and total OPPS raises to ~160K. Time to
> get and release lock drops to 4-5μs. Please note that this is just to
> demonstrate the impact of the synchronization and not a practical change, as
> synchronization is substantial to the correctness of the dynamic lock
> management.
> !latency-after.png|width=865,height=449!
> h2. Proposed solution
> It's been shown that dynamic lock allocation (and deallocation) per resource
> is costly, especially for hotspots like bucket/volume level locks.
> Synchronization is costly even for a single thread access.
> A simpler and more performant solution is lock stripping. The idea can be
> easily explained as if we preallocate an array of locks and select the lock
> for a resource based on its hashcode % array_size. Locks should be
> preallocated, reused and never deallocated.
> Implementation-wise, we don't need to reinvent the wheel but use Guava's
> [Striped.|https://guava.dev/releases/19.0/api/docs/com/google/common/util/concurrent/Striped.html]
> To minimize the chance of collision (different resources mapped to the same
> lock), the array size (or Striped size in Guava term) can be a product of the
> number of worker threads. A 10x factor would result in ~0% of collision.
> We may also introduce separate Striped for each resource type like volume,
> bucket or key to avoid cross-resource-type collisions. (Note that we haven't
> introduced key-level locks yet, but it's an important future project)
> h3. Pros
> * Fast and scalable, emilinate the cost of dynamic allocation/deallocation
> of locks, especially for hotspots (e.g. hot bucket level locks) or high
> cardinality (e.g. key level locks).
> * Simple, no synchronization complexity, no deallocation, available library.
>
> * GC friendly.
> h3. Cons
> * Introduce a risk of lock collision, yet can be eliminated by extending the
> lock array to e.g. 10x of the number of threads (which is not expensive).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]