Hi Mathieu,
Thanks for posting this.

> On Dec 17, 2025, at 8:45 PM, Mathieu Desnoyers 
> <[email protected]> wrote:
> 
> Hi,
> 
> Here is a revisited version of my Hazard Pointers series. Boqun, Joel,
> if you guys have time to try it out with your use-cases it would be
> great!
> 
> This new version does the following:
> 
> - It has 8 preallocated hazard pointer slots per CPU (one cache line),
> - The hazard pointer user allocates a hazard pointer context variable
>  (typically on the stack), which contains the pointer to the slot *and*
>  a backup slot,
> - When all the per-CPU slots are in use, fallback to the backup slot.
>  Chain the backup slot into per-CPU lists, each protected by a raw
>  spinlock.
> - The hazard pointer synchronize does a piecewise iteration on the
>  per-CPU overflow slots lists, releasing the raw spinlock between
>  each list item. It uses a 64-bit generation counter to check for
>  concurrent list changes, and restart the traversal on generation
>  counter mismatch.
> - There is a new CONFIG_PREEMPT_HAZPTR config option. When enabled,
>  the hazard pointer acquire/release adds and then removes the hazard
>  pointer context from a per-task linked list. On context switch, the
>  scheduler migrates the per-CPU slots used by the task to the backup
>  per-context slots, thus making sure the per-CPU slots are not used
>  by preempted and blocked tasks.

This last point is another reason why I want the slots to be per task instead 
of per CPU. It becomes very natural because the hazard pointer is always 
associated with a task only anyway, not with the CPU (at usecase level). By 
putting the slot in the task struct, we allow these requirements to flow 
naturally without requiring any locking or list management.. Did I miss 
something about the use cases?

I did some measurements about the task-scanning issue, and it is fast in my 
testing (~1ms/10000 tasks). Any input from you or anyone on what the typical 
task count distribution is that we are addressing? I also made a rough 
prototype, and it appears to be simpler with fewer lines of code because I do 
not need to handle preemption. It just happens naturally.

First of all, we can have a per-task counter that tracks how many hazard 
pointers are active. If this is zero, then we can simply skip the task instead 
of wasting cycles scanning all the task slot. Further, we can have a retire 
list that reuses a single scan to scan all the objects in the retire list, thus 
reusing the scan cost. This can also assist in asynchronously implementing 
object retiring via a dedicated thread perhaps with the tasks RCU 
infrastructure. We can also make this per-task counter a bitmap to speed up 
scanning potentially.

I am okay with the concept of an overflow list, but if we keep the overflow 
list at the per-task level instead of the per-CPU level, it is highly unlikely 
IMO that such an overflow list will be used unless more than, say, eight hazard 
pointers per task are active at any given time. So its lock contention would be 
rarer than, say, having a per-CPU overflow list. I would say that contention 
would be incredibly rare because typically hazard pointers are used by multiple 
tasks, each of which will have its own unique set of slots. Whereas in a 
per-CPU overflow  approach, we have a higher chance of lock contention, 
especially when the number of CPUs is low.

Other than the task-scanning performance issue, what am I missing?

Another nice benefit of using per-task hazard pointers is that we can also 
implement sleeping in hazard pointer sections because we will be scanning for 
sleeping tasks as well.

By contrast, the other approaches I have seen with per-CPU hazard pointers 
forbid sleeping, since after sleeping a task is no longer associated with its 
CPU. The other approaches also have a higher likelihood of locking Due to 
running out of slots.

Of course I am missing a use case, but I suspect we can find a per-CPU 
ref-count use case that benefits from this. I am researching use cases when I 
get time. I think my next task is to find a solid use case for this before 
doing further development of a solution..

By the way, feedback on the scanning patch:

Can you consider using a per-CPU counter to track the number of active slots 
per CPU? That way you can ignore CPU slots for CPUs that are not using hazard 
pointers. Another idea is to skip idle CPUs as well.

Have you also considered any asynchronous use case where maintaining a retired 
list would assist in RCU-style deferred reclaim of hazard-pointer objects?

thanks,

 - Joel 

> 
> It is based on v6.18.1.
> 
> Review is very welcome,
> 
> Thanks,
> 
> Mathieu
> 
> Cc: Nicholas Piggin <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Alan Stern <[email protected]>
> Cc: John Stultz <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Lai Jiangshan <[email protected]>
> Cc: Zqiang <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Waiman Long <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: [email protected]
> Cc: Mateusz Guzik <[email protected]>
> Cc: Jonas Oberhauser <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> 
> Mathieu Desnoyers (4):
>  compiler.h: Introduce ptr_eq() to preserve address dependency
>  Documentation: RCU: Refer to ptr_eq()
>  hazptr: Implement Hazard Pointers
>  hazptr: Migrate per-CPU slots to backup slot on context switch
> 
> Documentation/RCU/rcu_dereference.rst |  38 +++-
> include/linux/compiler.h              |  63 +++++++
> include/linux/hazptr.h                | 241 ++++++++++++++++++++++++++
> include/linux/sched.h                 |   4 +
> init/init_task.c                      |   3 +
> init/main.c                           |   2 +
> kernel/Kconfig.preempt                |  10 ++
> kernel/Makefile                       |   2 +-
> kernel/fork.c                         |   3 +
> kernel/hazptr.c                       | 150 ++++++++++++++++
> kernel/sched/core.c                   |   2 +
> 11 files changed, 512 insertions(+), 6 deletions(-)
> create mode 100644 include/linux/hazptr.h
> create mode 100644 kernel/hazptr.c
> 
> --
> 2.39.5
> 

Reply via email to