Re: [Cluster-devel] [GFS2 PATCH] GFS2: Convert LRU list to use a hash table to reduce contention

Bob Peterson Mon, 03 Jul 2017 13:32:16 -0700

----- Original Message -----
| Hi,
| 
| 
| On 30/06/17 19:18, Bob Peterson wrote:
| > Hi,
| >
| > This patch changes GFS2's LRU list from a single linked list with
| > a much-contended spin_lock to a hash table of linked lists, thus
| > having more locks with less contention. The key for the table is
| > set round-robin to evenly distribute the glocks into the buckets.
| >
| > Signed-off-by: Bob Peterson <[email protected]>
| This is a bit confused I think. There needs to be some more work done
| here to figure out exactly what the problem is, but I'm not very keen on
| this solution for a number of reasons...
| 
|   - It still claims to be LRU, but in fact it isn't any more
|   - The list might have been broken up, but the counter is still global,
| so that there is still the possibility for contention on that
|   - There is no performance data to say whether or not there is any
| improvement or not
|   - There should not in general be huge numbers of LRU operations on
| glocks anyway, so why is the LRU list contended in the first place?
|   - It introduces a new global counter at glock creation time, so
| something else that might land up being a contention point
| 
| The important question is what causes the contention on the LRU list in
| the first place - ideally we'd look at that issue in order to avoid so
| many list operations, rather than making it very complicated and non-LRU
| - at the very least there should be a justification for the new algorithm,
| 
| Steve.


Hi,

Let me see if I can address some of the issues you pointed out.

| There needs to be some more work done
| here to figure out exactly what the problem is

The problem, as I see it, is contention on the lru_lock, which is a single
point of contention between all GFS2 glock users as documented in this
bug: https://bugzilla.redhat.com/show_bug.cgi?id=1456886.

Before I posted the patch, I asked our performance group to run specsfs
on an instrumented kernel. The instrumented kernel basically tells me
how much total time was being spent in each 30-second iterval holding the
lru_lock, and how much time was spent waiting for it.
GFS2 only locks the lru_lock in five places, which I numbered 1 through 4.
The five are:

(1) gfs2_glock_add_to_lru
(2) gfs2_glock_remove_from_lru
(3) gfs2_glock_put
(4) gfs2_scan_glock_lru

The goal was to determine if the lru scanning code was holding the lru
lock for long periods of time. I examined the results in detail for the
duration of the specsfs run. Here is a very small excerpt of the output
that shows one of the anomalies:

Jun 14 15:21:34 dellfsvr1 kernel: held: 0 23 22 0 0   wait: 0 191 187 0 0
Jun 14 15:21:35 dellfsvr1 kernel: held: 0 110 115 0 0   wait: 0 563 523 1 0
Jun 14 15:21:50 dellfsvr1 kernel: held: 0 1541 1686 7 0   wait: 0 10596 10478 
30 0
Jun 14 15:21:50 dellfsvr1 kernel: held: 0 67 74 1 0   wait: 0 412 453 0 0
Jun 14 15:21:51 dellfsvr1 kernel: held: 0 49 68 0 0   wait: 0 232 248 0 0

As in this excerpt, the "held" times imply that no function is holding
the lru_lock for very long. In this excerpt, the longest time the lru lock
was held was 1686 milliseconds from remove_from_lru, followed closely by
add_to_lru. (In other words, the total time spent in add_to_lru was 1541 ms
during that 30 second reporting period.) Functions glock_put and
scan_glock_lru were so small they're barely a blip.

The "wait" times show that during a 30 second period, add_to_lru waited
a total of 10596 milliseconds, and remove_from_lru waited a total of 10478
milliseconds.

As an experiment, I tried disabling interrupts while the lru lock was
held. The result: Absolutely no change in performance, so the problem
is apparently not made worse by context switches.

My conclusion was that nobody is holding the lru lock for long; the
contention is just based on the sheer number of calls to lock it.

That's why I broke the lock into a hash table to split it up.

I'm going to ask the perf team to try a patched version with specsfs
because I can't seem to find the results of the last test.

(1) It still claims to be LRU, but in fact it isn't any more.

This is true. I don't know how much of a problem that is, but we
might be able to rearrange things to make it more like one.

(2) The list might have been broken up, but the counter is still global.

The counter is just an atomic, so it seems unlikely to me to contend
enough to make a difference. Worst case, it would be no worse than what
we've got now. Hopefully the performance numbers will reflect it.

(3) There is no performance data to say whether or not there is any improvement 
or not.

Again, I'll try to get those numbers.

(4) There should not in general be huge numbers of LRU operations on
glocks anyway, so why is the LRU list contended in the first place?

The glocks are being added to and taken off the lru list a lot.
If there aren't huge numbers of lru operations, there is no need for
bz#1456886. The "why is it contended in the first place" is explained
above: It's the sheer number of processes hitting that lock at any
given time.

(5) It introduces a new global counter at glock creation time, so
something else that might land up being a contention point.

Point taken. The performance numbers should hopefully reflect whether
that's true.

Stay tuned and I'll see if I can get some performance statistics.

Regards,

Bob Peterson
Red Hat File Systems

Re: [Cluster-devel] [GFS2 PATCH] GFS2: Convert LRU list to use a hash table to reduce contention

Reply via email to