Hi Ben,

On Thu, 2025-12-18 at 19:38 -0500, Benjamin Marzinski wrote:
> On Wed, Dec 17, 2025 at 10:21:10PM +0100, Martin Wilck wrote:
> > The libudev documentation [1] states that libudev is not thread-
> > safe and
> > shouldn't be used from multithreaded programs. While the man page
> > also
> > states that using libudev in multithreaded programs is unsafe even
> > if
> > locking is used, there's not much else we can do in multipath-tools
> > to
> > mitigate the problem, as we are using libudev in multiple threads
> > all over
> > the place.
> 
> Ugh. I'm a little confused on what exactly could go wrong. Do you
> know
> what they are concerned with, that locking cannot fix? 

No. I've asked our SUSE systemd experts but they didn't know either.
The man page mentions the use of setenv()/getenv(), but I don't think
that matters for us.

> It sounds like
> the functions themselves are safe to call from multiple threads, as
> long
> as they are working on separate data. If they were manipulating
> global
> state that wouldn't be true.

They aren't working on completely separate data, that's the problem. We
only have one "struct udev", which provides the context for the entire
operation of libudev in our code. The udev_device structs we use are
created from that object and link to it internally. Individual
udev_device structs are linked together e.g. through parent
releationships. That's why devices obtained by udev_device_get_parent()
et al. don't need to be unref()'d; they're referenced by their children
and supposed to be freed together with the last child.

As the udev operations are opaque, we can't make any assumptions about
which of the libudev funtions allocate memory, or modify shared data
structures otherwise. Some properties are fetched "lazily", when the
user actually needs them, and then cached in libudev. The 2-layer
architecture with udev_device being actually a shallow wrapper around
sd_device etc. doesn't make it easier to assess the code flow inside
the systemd libraries.

These issues should be fixed by serializing the libudev accesses with a
mutex. I agree with you that I don't understand what could still go
wrong if that's done.

In theory we could have separate "struct udev" devices per thread, but
keeping these in sync with each other would be a nightmare. We could
rather give up running multithreaded altogether.

> 
> > Add wrapper code that wraps all calls to libudev in a critical
> > section
> > holding a specific mutex.
> > 
> > [1] https://man7.org/linux/man-pages/man3/libudev.3.html
> 
> I assume that there must be times when use call a udev outside of the
> vecs lock. Did you check how often that is? Not that would be great
> to
> pile even more work on the vecs lock.

I wouldn't want to do that. Even if we could create a comprehensive
assessment now, it could become incorrect any time just by adding an
innocent libudev call somewhere. Also, I wouldn't want to overload the
vecs lock further. This code is ugly, but I am pretty sure that the
mutex operations will be light-weight. The mutex is always at the
bottom of our call stack, so no ABBA deadlock is possibe.

I started out by just wrapping some libudev calls, but I soon realized
that we can't do that without making assumptions about libudev's inner
workings, which we shouldn't (see above).

That said, I agree that it's unclear if this is really worth the
effort. I did it because I was hoping to fix the memory leak issue #129
with it. But that attempt was unsuccessful. So we have no evidence that
this serialization fixes any actual issue. multipathd has been using
libudev from multi-theaded context for 20 years, and we have never
encountered a problem that was caused by that (or if we did, we didn't
realize). At least in theory, the mutex could cause a deadlock or
noticeable delay if some libudev operation blocks or crashes, so the
change is not without risk. 

So, consider this part of the patch set an RFC.

OTOH, the libudev man page is pretty clear about being not thread-safe.

> Also, git flagged some whitespace errors.

Ups, will fix.

Thanks,
Martin

Reply via email to