On Wed, 2026-05-27 at 11:40 +0200, Hannes Reinecke wrote: > On 5/22/26 18:44, Martin Wilck wrote: > > Use shared_ptr to track the runner_context refcount. > > > > Signed-off-by: Martin Wilck <[email protected]> > > Reviewed-by: Benjamin Marzinski <[email protected]> > > --- > > libmpathutil/runner.c | 30 +++++++++--------------------- > > 1 file changed, 9 insertions(+), 21 deletions(-) > > > > diff --git a/libmpathutil/runner.c b/libmpathutil/runner.c > > index 8c6d6b9..56abd03 100644 > > --- a/libmpathutil/runner.c > > +++ b/libmpathutil/runner.c > > @@ -30,7 +30,6 @@ const char *runner_state_name(int state) > > } > > > > struct runner_context { > > - int refcount; > > int status; > > struct timespec deadline; > > pthread_t thr; > > @@ -39,17 +38,6 @@ struct runner_context { > > char __attribute__((aligned(sizeof(void *)))) data[]; > > }; > > > > -static void release_context(struct runner_context *rctx) > > -{ > > - int n; > > - > > - n = uatomic_sub_return(&rctx->refcount, 1); > > - assert(n >= 0); > > - > > - if (n == 0) > > - free(rctx); > > -} > > - > > static void cleanup_context(struct runner_context **prctx) > > { > > struct runner_context *rctx = *prctx; > > @@ -65,7 +53,7 @@ static void cleanup_context(struct runner_context > > **prctx) > > "%s: runner %p finished in state '%s'", > > __func__, rctx, > > runner_state_name(st)); > > } > > - release_context(rctx); > > + put_shared_ptr(rctx); > > } > > > > static void *runner_thread(void *arg) > > @@ -147,7 +135,7 @@ repeat: > > void release_runner(struct runner_context *rctx) > > { > > cancel_runner(rctx); > > - release_context(rctx); > > + put_shared_ptr(rctx); > > } > > > > int check_runner(struct runner_context *rctx, void *data, > > unsigned int size) > > @@ -195,17 +183,17 @@ struct runner_context *get_runner(runner_func > > func, void *data, > > return NULL; > > } > > > > - rctx = malloc(sizeof(*rctx) + size); > > + rctx = alloc_shared_ptr(sizeof(*rctx) + size, NULL); > > if (!rctx) > > return NULL; > > > > rctx->func = func; > > /* > > - * We have to set the refcount to 2 here. The runner > > thread may be > > - * cancelled before it even had the chance to increase the > > refcount, > > - * which could result in a use-after-free in > > cleanup_context(). > > + * Take an additional reference here. The runner thread > > may be > > + * cancelled before it even had the chance to take a > > reference, which > > + * could result in a use-after-free in cleanup_context(). > > */ > > - uatomic_set(&rctx->refcount, 2); > > + get_shared_ptr(rctx); > > uatomic_set(&rctx->status, RUNNER_IDLE); > > memcpy(rctx->data, data, size); > > > > @@ -222,8 +210,8 @@ struct runner_context *get_runner(runner_func > > func, void *data, > > > > if (rc) { > > condlog(1, "%s: pthread_create(): %s", __func__, > > strerror(rc)); > > - uatomic_dec(&rctx->refcount); > > - release_context(rctx); > > + put_shared_ptr(rctx); > > + put_shared_ptr(rctx); > > return NULL; > > } > > return rctx; > > Hmm. While I fully get the idea why one would want to use refcounted > objects, there really is only one snag: this is multipath. > We really, _really_, should make sure the daemon can execute even if > the root filesystem is not available. > So if we start dropping the reference for the checker (eg if the last > path drops and the checker is no longer used), we might end up > freeing > the memory (and dropping the modules altogether).
I am pretty sure that we handle this correctly. The current patch set doesn't change anything in this respect, except translating the hard- coded refcounting into the shared_ptr framework. > But if the path comes back we need to reinstate the checker, which > not > only requires additional memory (which might need a recursion into > the > filesystem to get free pages) but we also might need to read the > checker > module from disk (again). So plenty of opportunity to deaslock > waithing > for the disk be become readable. In checker_get(), we call add_checker_class() if the class is not yet loaded, which will take one ref on the class, and then get_shared_ptr() for every path (including the one for which add_checker_class() had been called), which means we have (number of paths + 1) references on the class, and will only drop the last ref when multipathd calls cleanup_checkers() during exit. Therefore I don't think it's possible that we unload the shared library prematurely, and have to reload it later. Regards Martin
