Hi Aaron, On Mon, 2026-05-25 at 10:48 -0400, Aaron Merey wrote: > My email client didn't properly wrap the lines of the proposal. Here > it is again with wrapping: > > Summary: We should stipulate to users that elfutils library functions > updating a library data structure are not thread safe with respect to > other calls acting on that data structure or related ones. This will > facilitate a more performant thread safety implementation. > Benchmarking data demonstrating this is included below. > > Thread safety in elfutils libraries is currently in an experimental > state but we are working towards official support. The implicit > contract we have been designing thread safety around has aimed for > maximum guarantees: elfutils library functions should be safe to > concurrently call alongside most other library functions. These very > strong guarantees aren't necessary for typical elfutils consumers and > they require heavy use of internal synchronization within the > libraries that hurts performance and limits multithreading > scalability. I want to propose that we relax elfutils library thread > safety guarantees to facilitate performance improvements while still > accommodating typical consumer use cases expected when thread safety > is officially available. > > We should not guarantee thread safety when simultaneously reading data > from a handle (gelf_get*, elf_getscn, elf_nextscn, elf_strptr, > elf_getdata, etc) and updating the handle or associated handles > (elf_newdata, elf_update, gelf_update_*, elf_flag*, etc). We should > clearly document for users that they are responsible for serializing > such calls. This will allow us to reduce internal rwlock usage and in > some cases replace it with atomic flags that track whether internal > lazy init has happened. We've discussed replacing an rwlock with an > atomic flag last year [1] and demonstrated that it's more performant > specifically for __libdw_dieabbrev.
OK, I think I agree with this analysis. So the libraries should guarantee thread-safety for any concurrent "read" operations on a specific handle (even if the underlying data structure is created lazily on first access, in which case we do use internal locking). But requires the user to use explicit/external locking when combining concurrent "read" and "write" operations on the same handle. What guarantees do we give on the "read" (sub)handles after any "write" operation? e.g when the code gets an Elf_Scn * from an Elf * and then an Elf_Data * from the Elf_Scn, are the Elf_Scn * and Elf_Data still valid after some "write" operation to the Elf *? Or are they invalidated and cannot be used after a "write" operation? > Below I detail additional benchmarking data demonstrating that this > approach should be adopted more generally. While much of the existing > rwlock usage in libelf and libdw is acceptable, in the worst case > certain rwlocks impose an unacceptable performance hit and limit > multithreading scalability. My conclusion is that we should relax our > thread safety guarantees as I've described here and, before we start > to officially support thread safety, clearly document the thread > safety contract for users (ex. which functions they are responsible > for serializing and when they should serialize, which are thread safe > accessors, etc) and remove/replace hotpath rwlocks that cause > performance problems. The good news is that from the benchmarking I've > done so far, I can only find 3 libelf functions that need to be > addressed. It looks like just focusing on libelf seems to provide a good concurrency win even for applications also using libdw. Is that because libdw doesn't contain problematic locks? Or is libdw not really thread- safe already? Cheers, Mark
