Hi Aaron,

On Mon, 2025-09-01 at 20:31 -0400, Aaron Merey wrote:
> __libdw_dieabbrev uses the abbrev_lock rwlock to synchronize access to the
> Dwarf_Die abbrev field as well as its lazy loading.  Calls to rwlock_wrlock
> and unlock incur significant performance overhead even in single-threaded
> cases.
> 
> This patch implements thread safety in __libdw_dieabbrev with GCC __atomic
> builtins instead of an rwlock, improving performance.

configure checks for C11 plus atomics support. So you should be able to
just use #include <stdatomic.h> and atomic_compare_exchange* functions.

>   abbrev_lock has been
> changed to a mutex and now synchronizes access to the Dwarf_CU field
> last_abbrev_offset in __libdw_findabbrev.
> 
> libdw/
>       * dwarf_begin_elf.c (valid_p): Change type of abbrev_lock from
>       rwlock to mutex.
>       * dwarf_end.c (cu_free): Ditto.
>       * dwarf_tag.c (__libdw_findabbrev): Protect
>       cu->last_abbrev_offset with mutex.
>       * libdwP.h (struct Dwarf_CU): Change type of abbrev_lock from
>       rwlock to mutex.
>       (__libdw_dieabbrev): Use __atomic builtins instead of rwlock.
>       * libdw_findcu.c (__libdw_intern_next_unit): Change type of
>       abbrev_lock from rwlock to mutex.

Apart from using stdatomic.h and the standard C11 functions I think
this is all correct.

> One issue we need to address by the time we remove the "experimental"
> notice from --enable-thread-safety is communicating to the library
> caller that they should not attempt to read Dwarf_Die.abbrev before
> it is loaded. Since the Dwarf_Die struct is public and transparent to
> library callers, one thread attempting to read the abbrev field while
> another thread lazy loads the abbrev is undefined behavior.
> 
> I don't think we can fully eliminate the possibility of UB under a lazy
> loading design given that we should not change the Dwarf_Die layout and
> its public accessibility.  Lazy loading implementation cannot prevent
> library callers from reading partially written abbrev bytes no matter
> what internal synchronization methods are used.

Maybe we just have to say that you cannot share Dwarf_Die references
between threads? So if you want to use a Dwarf_Die from another thread
you have to copy it instead by passing it by reference? It is a fairly
small struct so passing a copy instead of a reference shouldn't be a
very high overhead. Or maybe we should say that you should pass the
Dwarf and Dwarf_Die.addr to another thread and reconstruct it using
dwarf_die_addr_die? That would simplify things a bit on the
implementation side I think. But might make parallel algorithm using
libdw more awkward?

Thanks,

Mark

Reply via email to