On Wed, 2026-05-27 at 15:13 +0200, Hannes Reinecke wrote:
> On 5/27/26 12:11, Martin Wilck wrote:
> > On Wed, 2026-05-27 at 11:40 +0200, Hannes Reinecke wrote:
> >
> >
> > > But if the path comes back we need to reinstate the checker,
> > > which
> > > not
> > > only requires additional memory (which might need a recursion
> > > into
> > > the
> > > filesystem to get free pages) but we also might need to read the
> > > checker
> > > module from disk (again). So plenty of opportunity to deaslock
> > > waithing
> > > for the disk be become readable.
> >
> > In checker_get(), we call add_checker_class() if the class is not
> > yet
> > loaded, which will take one ref on the class, and then
> > get_shared_ptr()
> > for every path (including the one for which add_checker_class() had
> > been called), which means we have (number of paths + 1) references
> > on
> > the class, and will only drop the last ref when multipathd calls
> > cleanup_checkers() during exit.
> >
> > Therefore I don't think it's possible that we unload the shared
> > library
> > prematurely, and have to reload it later.
> >
> Hmm.
>
> Really hmmmmm.
>
> If the checkers are loaded during startup, and unloaded on shutdown,
> there really is no point in refcounting them, is there?
> Wouldn't global pointers to each check sufficient here, avoiding
> the need for refcounting altogether?
Currently checker and prioritizer DSOs aren't loaded during startup.
They are loaded when the first path that uses the given checker is
initialized.
I agree that it would be possible to simplify the code by just loading
all checkers and prioritizers in advance, in which case we wouldn't
need refcounting, like you wrote. Actually, we don't need to use shared
objects in the first place; we could simply include all the checker and
prioritizer code in libmultipath itself.
The shared object architecture dates back to multipath-tools 0.4.9
(2008). We (actually, you ;-) ) added refcounting to fix races in
323090f ("Use refcounting for checkers"). So far it hasn't occured to
me to change this architecture, which has been working well for over a
decade.
Anyway, this discussion is orthogonal to my current patch set.
Regards
Martin