[
https://issues.apache.org/jira/browse/HDDS-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duong updated HDDS-8765:
------------------------
Description:
h2. Problem
Today, OM manages resource (volume, bucket) locks by LockManager, a component
enclosing a lock table as a ConcurrentHashMap to store all active locks. Locks
are dynamically allocated and destroyed in the lock table based on runtime
needs. This means, for every lock allocated, a usage count is kept up to date
to decide when the lock is no longer referenced.
The current performance of LockManager is limited by the cost of maintaining
individual lock liveness, aka, counting how many concurrent usages to a lock
and removing it from the lock table when it's no longer used.
This cost mainly incurs from the need to *synchronize* all the concurrent
access to every lock (or technically, a ConcurrentHashMap section) when:
# When getting the lock to obtain: create a lock object if it's not existing
in the table and increase the lock usage count.
# When releasing the lock: decrease the usage count and remove the lock when
the usage count is 0.
!Screenshot 2023-06-05 at 4.31.14 PM.png|width=764,height=243!
This synchronization is done internally inside ConcurrentHashMap's two methods:
_compute_ and {_}computeIfPresent{_}.
This synchronization creates a bottleneck when multiple threads try to obtain
and release the same lock, even for read locks.
h2. Experiment
I did an experiment of pure OM key reads in the same buckets with 100 reader
threads. The freon command looks like the following:
{code:java}
ozone freon ockrw -v duong -b obs --contiguous -n 50000000 -p zerobytes
--percentage-read=100 -r 5000000 -s 0 --size 0 -m -t 100 {code}
With the current code, the total OPPS tops at ~100K and getKeyInfo latency is
~800μs. The time taken to get a lock for obtaining and releasing the lock is
~40μs. Note that for each getKeyInfo request, OM obtains and releases volume
and bucket locks multiple times.
With a [quick and naive
change|https://github.com/duongkame/ozone/commit/91c2729cef1649f038b1560d260d41f91de4c0b2]
to remove the synchronization when getting and releasing the locks, the
getKeyInfo latency drops to ~400μs and total OPPS raises to ~160K. Time to get
and release lock drops to 4-5μs. Please note that this is just to demonstrate
the impact of the synchronization and not a practical change, as
synchronization is substantial to the correctness of the dynamic lock
management.
h2. Proposed solution
It's been shown that dynamic lock allocation (and deallocation) per resource is
costly, especially for hotspots like bucket/volume level locks. Synchronization
is costly even for a single thread access.
A simpler and more performant solution is lock stripping. The idea can be
easily explained as if we preallocate an array of locks and select the lock for
a resource based on its hashcode % array_size. Locks should be preallocated,
reused and never deallocated.
Implementation-wise, we don't need to reinvent the wheel but use Guava's
[Striped.|https://guava.dev/releases/19.0/api/docs/com/google/common/util/concurrent/Striped.html]
To minimize the chance of collision (different resources mapped to the same
lock), the array size (or Striped size in Guava term) can be a product of the
number of worker threads. A 10x factor would result in ~0% of collision.
We may also introduce separate Striped for each resource type like volume,
bucket or key to avoid cross-resource-type collisions. (Note that we haven't
introduced key-level locks yet, but it's an important future project)
h3. Pros
* Fast and scalable, emilinate the cost of dynamic allocation/deallocation of
locks, especially for hotspots (e.g. hot bucket level locks) or high
cardinality (e.g. key level locks).
* Simple, no synchronization complexity.
* GC friendly.
h3. Cons
* Introduce a risk of lock collision, yet can be eliminated by extending the
lock array to e.g. 10x of number of threads (which is not expensive).
was:
h2. Problem
Today, OM manages resource (volume, bucket) locks by LockManager, a component
enclosing a lock table as a ConcurrentHashMap to store all active locks. Locks
are dynamically allocated and destroyed in the lock table based on runtime
needs. This means, for every lock allocated, a usage count is kept up to date
to decide when the lock is no longer referenced.
The current performance of LockManager is limited by the cost of maintaining
individual lock liveness, aka, counting how many concurrent usages to a lock
and removing it from the lock table when it's no longer used.
This cost mainly incurs from the need to *synchronize* all the concurrent
access to every lock (or technically, a ConcurrentHashMap section) when:
# When getting the lock to obtain: create a lock object if it's not existing
in the table and increase the lock usage count.
# When releasing the lock: decrease the usage count and remove the lock when
the usage count is 0.
!Screenshot 2023-06-05 at 4.31.14 PM.png|width=764,height=243!
This synchronization is done internally inside ConcurrentHashMap's two methods:
_compute_ and {_}computeIfPresent{_}.
This synchronization creates a bottleneck when multiple threads try to obtain
and release the same lock, even for read locks.
h2. Experiment
I did an experiment of pure OM key reads in the same buckets with 100 reader
threads. The freon command looks like the following:
{code:java}
ozone freon ockrw -v duong -b obs --contiguous -n 50000000 -p zerobytes
--percentage-read=100 -r 5000000 -s 0 --size 0 -m -t 100 {code}
With the current code, the total OPPS tops at ~100K and getKeyInfo latency is
~800μs. The time taken to get a lock for obtaining and releasing the lock is
~40μs. Note that for each getKeyInfo request, OM obtains and releases volume
and bucket locks multiple times.
With a [quick and naive
change|https://github.com/duongkame/ozone/commit/91c2729cef1649f038b1560d260d41f91de4c0b2]
to remove the synchronization when getting and releasing the locks, the
getKeyInfo latency drops to ~400μs and total OPPS raises to ~160K. Time to get
and release lock drops to 4-5μs. Please note that this is just to demonstrate
the impact of the synchronization and not a practical change, as
synchronization is substantial to the correctness of the dynamic lock
management.
h2. Proposed solution
It's been shown that dynamic lock allocation (and deallocation) per resource is
costly, especially for hotspots like bucket/volume level locks. Synchronization
is costly even for a single thread access.
A simpler and more performant solution is lock stripping. The idea can be
easily explained as if we preallocate an array of locks and select the lock for
a resource based on its hashcode % array_size. Locks should be preallocated,
reused and never deallocated.
Implementation-wise, we don't need to reinvent the wheel but use Guava's
[Striped.|https://guava.dev/releases/19.0/api/docs/com/google/common/util/concurrent/Striped.html]
To minimize the chance of collision (different resources mapped to the same
lock), the array size (or Striped size in Guava term) can be a product of the
number of worker threads. A 10x factor would result in ~0% of collision.
We may introduce separate Striped for each resource type like volume, bucket or
key to avoid cross-resource-type collisions. (Note that we haven't introduced
key-level locks yet, but it's an important future project)
> OM lock performance improvement
> -------------------------------
>
> Key: HDDS-8765
> URL: https://issues.apache.org/jira/browse/HDDS-8765
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Duong
> Priority: Major
> Attachments: Screenshot 2023-06-05 at 4.31.14 PM.png
>
>
> h2. Problem
> Today, OM manages resource (volume, bucket) locks by LockManager, a component
> enclosing a lock table as a ConcurrentHashMap to store all active locks.
> Locks are dynamically allocated and destroyed in the lock table based on
> runtime needs. This means, for every lock allocated, a usage count is kept up
> to date to decide when the lock is no longer referenced.
> The current performance of LockManager is limited by the cost of maintaining
> individual lock liveness, aka, counting how many concurrent usages to a lock
> and removing it from the lock table when it's no longer used.
> This cost mainly incurs from the need to *synchronize* all the concurrent
> access to every lock (or technically, a ConcurrentHashMap section) when:
> # When getting the lock to obtain: create a lock object if it's not existing
> in the table and increase the lock usage count.
> # When releasing the lock: decrease the usage count and remove the lock when
> the usage count is 0.
> !Screenshot 2023-06-05 at 4.31.14 PM.png|width=764,height=243!
> This synchronization is done internally inside ConcurrentHashMap's two
> methods: _compute_ and {_}computeIfPresent{_}.
> This synchronization creates a bottleneck when multiple threads try to obtain
> and release the same lock, even for read locks.
> h2. Experiment
> I did an experiment of pure OM key reads in the same buckets with 100 reader
> threads. The freon command looks like the following:
> {code:java}
> ozone freon ockrw -v duong -b obs --contiguous -n 50000000 -p zerobytes
> --percentage-read=100 -r 5000000 -s 0 --size 0 -m -t 100 {code}
> With the current code, the total OPPS tops at ~100K and getKeyInfo latency is
> ~800μs. The time taken to get a lock for obtaining and releasing the lock is
> ~40μs. Note that for each getKeyInfo request, OM obtains and releases volume
> and bucket locks multiple times.
> With a [quick and naive
> change|https://github.com/duongkame/ozone/commit/91c2729cef1649f038b1560d260d41f91de4c0b2]
> to remove the synchronization when getting and releasing the locks, the
> getKeyInfo latency drops to ~400μs and total OPPS raises to ~160K. Time to
> get and release lock drops to 4-5μs. Please note that this is just to
> demonstrate the impact of the synchronization and not a practical change, as
> synchronization is substantial to the correctness of the dynamic lock
> management.
> h2. Proposed solution
> It's been shown that dynamic lock allocation (and deallocation) per resource
> is costly, especially for hotspots like bucket/volume level locks.
> Synchronization is costly even for a single thread access.
> A simpler and more performant solution is lock stripping. The idea can be
> easily explained as if we preallocate an array of locks and select the lock
> for a resource based on its hashcode % array_size. Locks should be
> preallocated, reused and never deallocated.
> Implementation-wise, we don't need to reinvent the wheel but use Guava's
> [Striped.|https://guava.dev/releases/19.0/api/docs/com/google/common/util/concurrent/Striped.html]
> To minimize the chance of collision (different resources mapped to the same
> lock), the array size (or Striped size in Guava term) can be a product of the
> number of worker threads. A 10x factor would result in ~0% of collision.
> We may also introduce separate Striped for each resource type like volume,
> bucket or key to avoid cross-resource-type collisions. (Note that we haven't
> introduced key-level locks yet, but it's an important future project)
> h3. Pros
> * Fast and scalable, emilinate the cost of dynamic allocation/deallocation
> of locks, especially for hotspots (e.g. hot bucket level locks) or high
> cardinality (e.g. key level locks).
> * Simple, no synchronization complexity.
> * GC friendly.
> h3. Cons
> * Introduce a risk of lock collision, yet can be eliminated by extending the
> lock array to e.g. 10x of number of threads (which is not expensive).
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]