On Wed, 2026-05-27 at 11:40 +0200, Hannes Reinecke wrote:
> On 5/22/26 18:44, Martin Wilck wrote:
> > Use shared_ptr to track the runner_context refcount.
> > 
> > Signed-off-by: Martin Wilck <[email protected]>
> > Reviewed-by: Benjamin Marzinski <[email protected]>
> > ---
> >   libmpathutil/runner.c | 30 +++++++++---------------------
> >   1 file changed, 9 insertions(+), 21 deletions(-)
> > 
> > diff --git a/libmpathutil/runner.c b/libmpathutil/runner.c
> > index 8c6d6b9..56abd03 100644
> > --- a/libmpathutil/runner.c
> > +++ b/libmpathutil/runner.c
> > @@ -30,7 +30,6 @@ const char *runner_state_name(int state)
> >   }
> >   
> >   struct runner_context {
> > -   int refcount;
> >     int status;
> >     struct timespec deadline;
> >     pthread_t thr;
> > @@ -39,17 +38,6 @@ struct runner_context {
> >     char __attribute__((aligned(sizeof(void *)))) data[];
> >   };
> >   
> > -static void release_context(struct runner_context *rctx)
> > -{
> > -   int n;
> > -
> > -   n = uatomic_sub_return(&rctx->refcount, 1);
> > -   assert(n >= 0);
> > -
> > -   if (n == 0)
> > -           free(rctx);
> > -}
> > -
> >   static void cleanup_context(struct runner_context **prctx)
> >   {
> >     struct runner_context *rctx = *prctx;
> > @@ -65,7 +53,7 @@ static void cleanup_context(struct runner_context
> > **prctx)
> >                     "%s: runner %p finished in state '%s'",
> > __func__, rctx,
> >                     runner_state_name(st));
> >     }
> > -   release_context(rctx);
> > +   put_shared_ptr(rctx);
> >   }
> >   
> >   static void *runner_thread(void *arg)
> > @@ -147,7 +135,7 @@ repeat:
> >   void release_runner(struct runner_context *rctx)
> >   {
> >     cancel_runner(rctx);
> > -   release_context(rctx);
> > +   put_shared_ptr(rctx);
> >   }
> >   
> >   int check_runner(struct runner_context *rctx, void *data,
> > unsigned int size)
> > @@ -195,17 +183,17 @@ struct runner_context *get_runner(runner_func
> > func, void *data,
> >             return NULL;
> >     }
> >   
> > -   rctx = malloc(sizeof(*rctx) + size);
> > +   rctx = alloc_shared_ptr(sizeof(*rctx) + size, NULL);
> >     if (!rctx)
> >             return NULL;
> >   
> >     rctx->func = func;
> >     /*
> > -    * We have to set the refcount to 2 here. The runner
> > thread may be
> > -    * cancelled before it even had the chance to increase the
> > refcount,
> > -    * which could result in a use-after-free in
> > cleanup_context().
> > +    * Take an additional reference here. The runner thread
> > may be
> > +    * cancelled before it even had the chance to take a
> > reference, which
> > +    * could result in a use-after-free in cleanup_context().
> >      */
> > -   uatomic_set(&rctx->refcount, 2);
> > +   get_shared_ptr(rctx);
> >     uatomic_set(&rctx->status, RUNNER_IDLE);
> >     memcpy(rctx->data, data, size);
> >   
> > @@ -222,8 +210,8 @@ struct runner_context *get_runner(runner_func
> > func, void *data,
> >   
> >     if (rc) {
> >             condlog(1, "%s: pthread_create(): %s", __func__,
> > strerror(rc));
> > -           uatomic_dec(&rctx->refcount);
> > -           release_context(rctx);
> > +           put_shared_ptr(rctx);
> > +           put_shared_ptr(rctx);
> >             return NULL;
> >     }
> >     return rctx;
> 
> Hmm. While I fully get the idea why one would want to use refcounted 
> objects, there really is only one snag: this is multipath.
> We really, _really_, should make sure the daemon can execute even if
> the root filesystem is not available.
> So if we start dropping the reference for the checker (eg if the last
> path drops and the checker is no longer used), we might end up
> freeing
> the memory (and dropping the modules altogether).

I am pretty sure that we handle this correctly. The current patch set 
doesn't change anything in this respect, except translating the hard-
coded refcounting into the shared_ptr framework.

> But if the path comes back we need to reinstate the checker, which
> not
> only requires additional memory (which might need a recursion into
> the
> filesystem to get free pages) but we also might need to read the
> checker
> module from disk (again). So plenty of opportunity to deaslock
> waithing
> for the disk be become readable.

In checker_get(), we call add_checker_class() if the class is not yet
loaded, which will take one ref on the class, and then get_shared_ptr()
for every path (including the one for which add_checker_class() had
been called), which means we have (number of paths + 1) references on
the class, and will only drop the last ref when multipathd calls
cleanup_checkers() during exit.

Therefore I don't think it's possible that we unload the shared library
prematurely, and have to reload it later.

Regards
Martin

Reply via email to