* Motivation

The purpose of this hierarchical split-counter scheme is to:

- Minimize contention when incrementing and decrementing counters,
- Provide fast access to a sum approximation,
- Provide a sum approximation with an acceptable accuracy level when
  scaling to many-core systems.
- Provide approximate and precise comparison of two counters, and
  between a counter and a value.

It aims at fixing the per-mm RSS tracking which has become too
inaccurate for OOM killer purposes on large many-core systems [1].

* Design

The hierarchical per-CPU counters propagate a sum approximation through
a N-way tree. When reaching the batch size, the carry is propagated
through a binary tree which consists of logN(nr_cpu_ids) levels. The
batch size for each level is twice the batch size of the prior level.

Example propagation diagram with 8 cpus through a binary tree:

Level 0:  0    1    2    3    4    5    6    7
          |   /     |   /     |   /     |   /
          |  /      |  /      |  /      |  /
          | /       | /       | /       | /
Level 1:  0         1         2         3
          |       /           |       /
          |    /              |    /
          | /                 | /
Level 2:  0                   1
          |               /
          |         /
          |   /
Level 3:  0

For a binary tree, the maximum inaccuracy is bound by:
   batch_size * log2(nr_cpus) * nr_cpus
which evolves with O(n*log(n)) as the number of CPUs increases.

For a N-way tree, the maximum inaccuracy can be pre-calculated
based on the the N-arity of each level and the batch size.

Link: 
https://lore.kernel.org/lkml/20250331223516.7810-2-sweettea-ker...@dorminy.me/ 
# [1]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoy...@efficios.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: "Paul E. McKenney" <paul...@kernel.org>
Cc: Steven Rostedt <rost...@goodmis.org>
Cc: Masami Hiramatsu <mhira...@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoy...@efficios.com>
Cc: Dennis Zhou <den...@kernel.org>
Cc: Tejun Heo <t...@kernel.org>
Cc: Christoph Lameter <c...@linux.com>
Cc: Martin Liu <liumar...@google.com>
Cc: David Rientjes <rient...@google.com>
Cc: christian.koe...@amd.com
Cc: Shakeel Butt <shakeel.b...@linux.dev>
Cc: Johannes Weiner <han...@cmpxchg.org>
Cc: Sweet Tea Dorminy <sweet...@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoa...@oracle.com>
Cc: "Liam R . Howlett" <liam.howl...@oracle.com>
Cc: Suren Baghdasaryan <sur...@google.com>
Cc: Vlastimil Babka <vba...@suse.cz>
Cc: Christian Brauner <brau...@kernel.org>
Cc: Wei Yang <richard.weiy...@gmail.com>
Cc: David Hildenbrand <da...@redhat.com>
Cc: Miaohe Lin <linmia...@huawei.com>
Cc: Al Viro <v...@zeniv.linux.org.uk>
Cc: linux...@kvack.org
Cc: linux-trace-kernel@vger.kernel.org
Cc: Yu Zhao <yuz...@google.com>
Cc: Roman Gushchin <roman.gushc...@linux.dev>
Cc: Mateusz Guzik <mjgu...@gmail.com>
Cc: Matthew Wilcox <wi...@infradead.org>
---
Changes since v3:
- Add gfp flags to init function.

Changes since v2:
- Introduce N-way tree to reduce tree depth on larger systems.

Changes since v1:
- Remove percpu_counter_tree_precise_sum_unbiased from public header,
  make this function static,
- Introduce precise and approximate comparisons between two counters,
- Reorder the struct percpu_counter_tree fields,
- Introduce approx_sum field, which points to the approximate sum
  for the percpu_counter_tree_approximate_sum() fast path.
---
 include/linux/percpu_counter_tree.h | 108 ++++++++
 lib/Makefile                        |   1 +
 lib/percpu_counter_tree.c           | 393 ++++++++++++++++++++++++++++
 3 files changed, 502 insertions(+)
 create mode 100644 include/linux/percpu_counter_tree.h
 create mode 100644 lib/percpu_counter_tree.c

diff --git a/include/linux/percpu_counter_tree.h 
b/include/linux/percpu_counter_tree.h
new file mode 100644
index 000000000000..cb7a97c8a823
--- /dev/null
+++ b/include/linux/percpu_counter_tree.h
@@ -0,0 +1,108 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR MIT */
+/* SPDX-FileCopyrightText: 2025 Mathieu Desnoyers 
<mathieu.desnoy...@efficios.com> */
+
+#ifndef _PERCPU_COUNTER_TREE_H
+#define _PERCPU_COUNTER_TREE_H
+
+#include <linux/cleanup.h>
+#include <linux/preempt.h>
+#include <linux/atomic.h>
+#include <linux/percpu.h>
+
+struct percpu_counter_tree_level_item {
+       atomic_t count;
+} ____cacheline_aligned_in_smp;
+
+struct percpu_counter_tree {
+       /* Fast-path fields. */
+       unsigned int __percpu *level0;
+       unsigned int level0_bit_mask;
+       union {
+               unsigned int *i;
+               atomic_t *a;
+       } approx_sum;
+       int bias;                       /* bias for counter_set */
+
+       /* Slow-path fields. */
+       struct percpu_counter_tree_level_item *items;
+       unsigned int batch_size;
+       unsigned int inaccuracy;        /* approximation imprecise within ± 
inaccuracy */
+};
+
+int percpu_counter_tree_init(struct percpu_counter_tree *counter, unsigned int 
batch_size, gfp_t gfp_flags);
+void percpu_counter_tree_destroy(struct percpu_counter_tree *counter);
+void percpu_counter_tree_add_slowpath(struct percpu_counter_tree *counter, int 
inc);
+int percpu_counter_tree_precise_sum(struct percpu_counter_tree *counter);
+int percpu_counter_tree_approximate_compare(struct percpu_counter_tree *a, 
struct percpu_counter_tree *b);
+int percpu_counter_tree_approximate_compare_value(struct percpu_counter_tree 
*counter, int v);
+int percpu_counter_tree_precise_compare(struct percpu_counter_tree *a, struct 
percpu_counter_tree *b);
+int percpu_counter_tree_precise_compare_value(struct percpu_counter_tree 
*counter, int v);
+void percpu_counter_tree_set_bias(struct percpu_counter_tree *counter, int 
bias);
+void percpu_counter_tree_set(struct percpu_counter_tree *counter, int v);
+unsigned int percpu_counter_tree_inaccuracy(struct percpu_counter_tree 
*counter);
+
+/* Fast paths */
+
+static inline
+int percpu_counter_tree_carry(int orig, int res, int inc, unsigned int 
bit_mask)
+{
+       if (inc < 0) {
+               inc = -(-inc & ~(bit_mask - 1));
+               /*
+                * xor bit_mask: underflow.
+                *
+                * If inc has bit set, decrement an additional bit if
+                * there is _no_ bit transition between orig and res.
+                * Else, inc has bit cleared, decrement an additional
+                * bit if there is a bit transition between orig and
+                * res.
+                */
+               if ((inc ^ orig ^ res) & bit_mask)
+                       inc -= bit_mask;
+       } else {
+               inc &= ~(bit_mask - 1);
+               /*
+                * xor bit_mask: overflow.
+                *
+                * If inc has bit set, increment an additional bit if
+                * there is _no_ bit transition between orig and res.
+                * Else, inc has bit cleared, increment an additional
+                * bit if there is a bit transition between orig and
+                * res.
+                */
+               if ((inc ^ orig ^ res) & bit_mask)
+                       inc += bit_mask;
+       }
+       return inc;
+}
+
+static inline
+void percpu_counter_tree_add(struct percpu_counter_tree *counter, int inc)
+{
+       unsigned int bit_mask = counter->level0_bit_mask, orig, res;
+
+       if (!inc)
+               return;
+       /* Make sure the fast and slow paths use the same cpu number. */
+       guard(migrate)();
+       res = this_cpu_add_return(*counter->level0, inc);
+       orig = res - inc;
+       inc = percpu_counter_tree_carry(orig, res, inc, bit_mask);
+       if (!inc)
+               return;
+       percpu_counter_tree_add_slowpath(counter, inc);
+}
+
+static inline
+int percpu_counter_tree_approximate_sum(struct percpu_counter_tree *counter)
+{
+       unsigned int v;
+
+       if (!counter->level0_bit_mask)
+               v = READ_ONCE(*counter->approx_sum.i);
+       else
+               v = atomic_read(counter->approx_sum.a);
+       return (int) (v + (unsigned int)READ_ONCE(counter->bias));
+}
+
+#endif  /* _PERCPU_COUNTER_TREE_H */
diff --git a/lib/Makefile b/lib/Makefile
index f07b24ce1b3f..1d86c97ee39b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -191,6 +191,7 @@ obj-$(CONFIG_TEXTSEARCH_KMP) += ts_kmp.o
 obj-$(CONFIG_TEXTSEARCH_BM) += ts_bm.o
 obj-$(CONFIG_TEXTSEARCH_FSM) += ts_fsm.o
 obj-$(CONFIG_SMP) += percpu_counter.o
+obj-$(CONFIG_SMP) += percpu_counter_tree.o
 obj-$(CONFIG_AUDIT_GENERIC) += audit.o
 obj-$(CONFIG_AUDIT_COMPAT_GENERIC) += compat_audit.o
 
diff --git a/lib/percpu_counter_tree.c b/lib/percpu_counter_tree.c
new file mode 100644
index 000000000000..da6fccf9f883
--- /dev/null
+++ b/lib/percpu_counter_tree.c
@@ -0,0 +1,393 @@
+// SPDX-License-Identifier: GPL-2.0+ OR MIT
+// SPDX-FileCopyrightText: 2025 Mathieu Desnoyers 
<mathieu.desnoy...@efficios.com>
+
+/*
+ * Split Counters With Tree Approximation Propagation
+ *
+ * * Propagation diagram when reaching batch size thresholds (± batch size):
+ *
+ * Example diagram for 8 CPUs:
+ *
+ * log2(8) = 3 levels
+ *
+ * At each level, each pair propagates its values to the next level when
+ * reaching the batch size thresholds.
+ *
+ * Counters at levels 0, 1, 2 can be kept on a single byte (±128 range),
+ * although it may be relevant to keep them on 32-bit counters for
+ * simplicity. (complexity vs memory footprint tradeoff)
+ *
+ * Counter at level 3 can be kept on a 32-bit counter.
+ *
+ * Level 0:  0    1    2    3    4    5    6    7
+ *           |   /     |   /     |   /     |   /
+ *           |  /      |  /      |  /      |  /
+ *           | /       | /       | /       | /
+ * Level 1:  0         1         2         3
+ *           |       /           |       /
+ *           |    /              |    /
+ *           | /                 | /
+ * Level 2:  0                   1
+ *           |               /
+ *           |         /
+ *           |   /
+ * Level 3:  0
+ *
+ * * Approximation inaccuracy:
+ *
+ * BATCH(level N): Level N batch size.
+ *
+ * Example for BATCH(level 0) = 32.
+ *
+ * BATCH(level 0) =  32
+ * BATCH(level 1) =  64
+ * BATCH(level 2) = 128
+ * BATCH(level N) = BATCH(level 0) * 2^N
+ *
+ *            per-counter     global
+ *            inaccuracy      inaccuracy
+ * Level 0:   [ -32 ..  +31]  ±256  (8 * 32)
+ * Level 1:   [ -64 ..  +63]  ±256  (4 * 64)
+ * Level 2:   [-128 .. +127]  ±256  (2 * 128)
+ * Total:      ------         ±768  (log2(nr_cpu_ids) * BATCH(level 0) * 
nr_cpu_ids)
+ *
+ * -----
+ *
+ * Approximate Sum Carry Propagation
+ *
+ * Let's define a number of counter bits for each level, e.g.:
+ *
+ * log2(BATCH(level 0)) = log2(32) = 5
+ *
+ *               nr_bit        value_mask                      range
+ * Level 0:      5 bits        v                             0 ..  +31
+ * Level 1:      1 bit        (v & ~((1UL << 5) - 1))        0 ..  +63
+ * Level 2:      1 bit        (v & ~((1UL << 6) - 1))        0 .. +127
+ * Level 3:     25 bits       (v & ~((1UL << 7) - 1))        0 .. 2^32-1
+ *
+ * Note: Use a full 32-bit per-cpu counter at level 0 to allow precise sum.
+ *
+ * Note: Use cacheline aligned counters at levels above 0 to prevent false 
sharing.
+ *       If memory footprint is an issue, a specialized allocator could be used
+ *       to eliminate padding.
+ *
+ * Example with expanded values:
+ *
+ * counter_add(counter, inc):
+ *
+ *         if (!inc)
+ *                 return;
+ *
+ *         res = percpu_add_return(counter @ Level 0, inc);
+ *         orig = res - inc;
+ *         if (inc < 0) {
+ *                 inc = -(-inc & ~0b00011111);  // Clear used bits
+ *                 // xor bit 5: underflow
+ *                 if ((inc ^ orig ^ res) & 0b00100000)
+ *                         inc -= 0b00100000;
+ *         } else {
+ *                 inc &= ~0b00011111;           // Clear used bits
+ *                 // xor bit 5: overflow
+ *                 if ((inc ^ orig ^ res) & 0b00100000)
+ *                         inc += 0b00100000;
+ *         }
+ *         if (!inc)
+ *                 return;
+ *
+ *         res = atomic_add_return(counter @ Level 1, inc);
+ *         orig = res - inc;
+ *         if (inc < 0) {
+ *                 inc = -(-inc & ~0b00111111);  // Clear used bits
+ *                 // xor bit 6: underflow
+ *                 if ((inc ^ orig ^ res) & 0b01000000)
+ *                         inc -= 0b01000000;
+ *         } else {
+ *                 inc &= ~0b00111111;           // Clear used bits
+ *                 // xor bit 6: overflow
+ *                 if ((inc ^ orig ^ res) & 0b01000000)
+ *                         inc += 0b01000000;
+ *         }
+ *         if (!inc)
+ *                 return;
+ *
+ *         res = atomic_add_return(counter @ Level 2, inc);
+ *         orig = res - inc;
+ *         if (inc < 0) {
+ *                 inc = -(-inc & ~0b01111111);  // Clear used bits
+ *                 // xor bit 7: underflow
+ *                 if ((inc ^ orig ^ res) & 0b10000000)
+ *                         inc -= 0b10000000;
+ *         } else {
+ *                 inc &= ~0b01111111;           // Clear used bits
+ *                 // xor bit 7: overflow
+ *                 if ((inc ^ orig ^ res) & 0b10000000)
+ *                         inc += 0b10000000;
+ *         }
+ *         if (!inc)
+ *                 return;
+ *
+ *         atomic_add(counter @ Level 3, inc);
+ */
+
+#include <linux/percpu_counter_tree.h>
+#include <linux/cpumask.h>
+#include <linux/percpu.h>
+#include <linux/atomic.h>
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/slab.h>
+#include <linux/math.h>
+
+#define MAX_NR_LEVELS 5
+
+struct counter_config {
+       unsigned int nr_items;
+       unsigned char nr_levels;
+       unsigned char n_arity_order[MAX_NR_LEVELS];
+};
+
+/*
+ * nr_items is the number of items in the tree for levels 1 to and
+ * including the final level (approximate sum). It excludes the level 0
+ * per-cpu counters.
+ */
+static const struct counter_config per_nr_cpu_order_config[] = {
+       [0] =   { .nr_items = 1,        .nr_levels = 0,         .n_arity_order 
= { 0 } },
+       [1] =   { .nr_items = 3,        .nr_levels = 1,         .n_arity_order 
= { 1 } },
+       [2] =   { .nr_items = 3,        .nr_levels = 2,         .n_arity_order 
= { 1, 1 } },
+       [3] =   { .nr_items = 7,        .nr_levels = 3,         .n_arity_order 
= { 1, 1, 1 } },
+       [4] =   { .nr_items = 7,        .nr_levels = 3,         .n_arity_order 
= { 2, 1, 1 } },
+       [5] =   { .nr_items = 11,       .nr_levels = 3,         .n_arity_order 
= { 2, 2, 1 } },
+       [6] =   { .nr_items = 21,       .nr_levels = 3,         .n_arity_order 
= { 2, 2, 2 } },
+       [7] =   { .nr_items = 21,       .nr_levels = 3,         .n_arity_order 
= { 3, 2, 2 } },
+       [8] =   { .nr_items = 37,       .nr_levels = 3,         .n_arity_order 
= { 3, 3, 2 } },
+       [9] =   { .nr_items = 73,       .nr_levels = 3,         .n_arity_order 
= { 3, 3, 3 } },
+       [10] =  { .nr_items = 149,      .nr_levels = 4,         .n_arity_order 
= { 3, 3, 2, 2 } },
+       [11] =  { .nr_items = 293,      .nr_levels = 4,         .n_arity_order 
= { 3, 3, 3, 2 } },
+       [12] =  { .nr_items = 585,      .nr_levels = 4,         .n_arity_order 
= { 3, 3, 3, 3 } },
+       [13] =  { .nr_items = 1173,     .nr_levels = 5,         .n_arity_order 
= { 3, 3, 3, 2, 2 } },
+       [14] =  { .nr_items = 2341,     .nr_levels = 5,         .n_arity_order 
= { 3, 3, 3, 3, 2 } },
+       [15] =  { .nr_items = 4681,     .nr_levels = 5,         .n_arity_order 
= { 3, 3, 3, 3, 3 } },
+       [16] =  { .nr_items = 4681,     .nr_levels = 5,         .n_arity_order 
= { 4, 3, 3, 3, 3 } },
+       [17] =  { .nr_items = 8777,     .nr_levels = 5,         .n_arity_order 
= { 4, 4, 3, 3, 3 } },
+       [18] =  { .nr_items = 17481,    .nr_levels = 5,         .n_arity_order 
= { 4, 4, 4, 3, 3 } },
+       [19] =  { .nr_items = 34953,    .nr_levels = 5,         .n_arity_order 
= { 4, 4, 4, 4, 3 } },
+       [20] =  { .nr_items = 69905,    .nr_levels = 5,         .n_arity_order 
= { 4, 4, 4, 4, 4 } },
+};
+
+static const struct counter_config *counter_config;
+static unsigned int nr_cpus_order, inaccuracy_multiplier;
+
+int percpu_counter_tree_init(struct percpu_counter_tree *counter, unsigned int 
batch_size, gfp_t gfp_flags)
+{
+       /* Batch size must be power of 2 */
+       if (!batch_size || (batch_size & (batch_size - 1)))
+               return -EINVAL;
+       counter->batch_size = batch_size;
+       counter->bias = 0;
+       counter->level0 = alloc_percpu_gfp(unsigned int, gfp_flags);
+       if (!counter->level0)
+               return -ENOMEM;
+       if (!nr_cpus_order) {
+               counter->items = NULL;
+               counter->approx_sum.i = per_cpu_ptr(counter->level0, 0);
+               counter->level0_bit_mask = 0;
+       } else {
+               counter->items = kzalloc(counter_config->nr_items *
+                                        sizeof(struct 
percpu_counter_tree_level_item),
+                                        gfp_flags);
+               if (!counter->items) {
+                       free_percpu(counter->level0);
+                       return -ENOMEM;
+               }
+               counter->approx_sum.a = 
&counter->items[counter_config->nr_items - 1].count;
+               counter->level0_bit_mask = 1UL << get_count_order(batch_size);
+       }
+       counter->inaccuracy = batch_size * inaccuracy_multiplier;
+       return 0;
+}
+
+void percpu_counter_tree_destroy(struct percpu_counter_tree *counter)
+{
+       free_percpu(counter->level0);
+       kfree(counter->items);
+}
+
+/* Called with migration disabled. */
+void percpu_counter_tree_add_slowpath(struct percpu_counter_tree *counter, int 
inc)
+{
+       unsigned int level_items, nr_levels = counter_config->nr_levels,
+                    level, n_arity_order, bit_mask;
+       struct percpu_counter_tree_level_item *item = counter->items;
+       unsigned int cpu = smp_processor_id();
+
+       WARN_ON_ONCE(!nr_cpus_order);   /* Should never be called for 1 cpu. */
+
+       n_arity_order = counter_config->n_arity_order[0];
+       bit_mask = counter->level0_bit_mask << n_arity_order;
+       level_items = 1U << (nr_cpus_order - n_arity_order);
+
+       for (level = 1; level < nr_levels; level++) {
+               atomic_t *count = &item[cpu & (level_items - 1)].count;
+               unsigned int orig, res;
+
+               res = atomic_add_return_relaxed(inc, count);
+               orig = res - inc;
+               inc = percpu_counter_tree_carry(orig, res, inc, bit_mask);
+               if (!inc)
+                       return;
+               item += level_items;
+               n_arity_order = counter_config->n_arity_order[level];
+               level_items >>= n_arity_order;
+               bit_mask <<= n_arity_order;
+       }
+       atomic_add(inc, counter->approx_sum.a);
+}
+
+/*
+ * Precise sum. Perform the sum of all per-cpu counters.
+ */
+static int percpu_counter_tree_precise_sum_unbiased(struct percpu_counter_tree 
*counter)
+{
+       unsigned int sum = 0;
+       int cpu;
+
+       for_each_possible_cpu(cpu)
+               sum += *per_cpu_ptr(counter->level0, cpu);
+       return (int) sum;
+}
+
+int percpu_counter_tree_precise_sum(struct percpu_counter_tree *counter)
+{
+       return percpu_counter_tree_precise_sum_unbiased(counter) + 
READ_ONCE(counter->bias);
+}
+
+/*
+ * Do an approximate comparison of two counters.
+ * Return 0 if counters do not differ by more than the sum of their
+ * respective inaccuracy ranges,
+ * Return -1 if counter @a less than counter @b,
+ * Return 1 if counter @a is greater than counter @b.
+ */
+int percpu_counter_tree_approximate_compare(struct percpu_counter_tree *a, 
struct percpu_counter_tree *b)
+{
+       int count_a = percpu_counter_tree_approximate_sum(a),
+           count_b = percpu_counter_tree_approximate_sum(b);
+
+       if (abs(count_a - count_b) <= (a->inaccuracy + b->inaccuracy))
+               return 0;
+       if (count_a < count_b)
+               return -1;
+       return 1;
+}
+
+/*
+ * Do an approximate comparison of a counter against a given value.
+ * Return 0 if the value is within the inaccuracy range of the counter,
+ * Return -1 if the value less than counter,
+ * Return 1 if the value is greater than counter.
+ */
+int percpu_counter_tree_approximate_compare_value(struct percpu_counter_tree 
*counter, int v)
+{
+       int count = percpu_counter_tree_approximate_sum(counter);
+
+       if (abs(v - count) <= counter->inaccuracy)
+               return 0;
+       if (count < v)
+               return -1;
+       return 1;
+}
+
+/*
+ * Do a precise comparison of two counters.
+ * Return 0 if the counters are equal,
+ * Return -1 if counter @a less than counter @b,
+ * Return 1 if counter @a is greater than counter @b.
+ */
+int percpu_counter_tree_precise_compare(struct percpu_counter_tree *a, struct 
percpu_counter_tree *b)
+{
+       int count_a = percpu_counter_tree_approximate_sum(a),
+           count_b = percpu_counter_tree_approximate_sum(b);
+
+       if (abs(count_a - count_b) <= (a->inaccuracy + b->inaccuracy)) {
+               if (b->inaccuracy < a->inaccuracy) {
+                       count_a = percpu_counter_tree_precise_sum(a);
+                       if (abs(count_a - count_b) <= b->inaccuracy)
+                               count_b = percpu_counter_tree_precise_sum(b);
+               } else {
+                       count_b = percpu_counter_tree_precise_sum(b);
+                       if (abs(count_a - count_b) <= a->inaccuracy)
+                               count_a = percpu_counter_tree_precise_sum(a);
+               }
+       }
+       if (count_a > count_b)
+               return -1;
+       if (count_a > count_b)
+               return 1;
+       return 0;
+}
+
+/*
+ * Do a precise comparision of a counter against a given value.
+ * Return 0 if the value is equal to the counter,
+ * Return -1 if the value less than counter,
+ * Return 1 if the value is greater than counter.
+ */
+int percpu_counter_tree_precise_compare_value(struct percpu_counter_tree 
*counter, int v)
+{
+       int count = percpu_counter_tree_approximate_sum(counter);
+
+       if (abs(v - count) <= counter->inaccuracy)
+               count = percpu_counter_tree_precise_sum(counter);
+       if (count < v)
+               return -1;
+       if (count > v)
+               return 1;
+       return 0;
+}
+
+void percpu_counter_tree_set_bias(struct percpu_counter_tree *counter, int 
bias)
+{
+       WRITE_ONCE(counter->bias, bias);
+}
+
+void percpu_counter_tree_set(struct percpu_counter_tree *counter, int v)
+{
+       percpu_counter_tree_set_bias(counter,
+                                    v - 
percpu_counter_tree_precise_sum_unbiased(counter));
+}
+
+unsigned int percpu_counter_tree_inaccuracy(struct percpu_counter_tree 
*counter)
+{
+       return counter->inaccuracy;
+}
+
+static unsigned int __init calculate_inaccuracy_multiplier(void)
+{
+       unsigned int nr_levels = counter_config->nr_levels, level;
+       unsigned int level_items = 1U << nr_cpus_order;
+       unsigned int inaccuracy = 0, batch_size = 1;
+
+       for (level = 0; level < nr_levels; level++) {
+               unsigned int n_arity_order = 
counter_config->n_arity_order[level];
+
+               inaccuracy += batch_size * level_items;
+               batch_size <<= n_arity_order;
+               level_items >>= n_arity_order;
+       }
+       return inaccuracy;
+}
+
+static int __init percpu_counter_startup(void)
+{
+
+       nr_cpus_order = get_count_order(nr_cpu_ids);
+       if (WARN_ON_ONCE(nr_cpus_order >= ARRAY_SIZE(per_nr_cpu_order_config))) 
{
+               printk(KERN_ERR "Unsupported number of CPUs (%u)\n", 
nr_cpu_ids);
+               return -1;
+       }
+       counter_config = &per_nr_cpu_order_config[nr_cpus_order];
+       inaccuracy_multiplier = calculate_inaccuracy_multiplier();
+       return 0;
+}
+module_init(percpu_counter_startup);
-- 
2.39.5


Reply via email to