Hi, On Fri, Mar 3, 2023 at 11:08 AM Andreas Gruenbacher <agrue...@redhat.com> wrote: > > On Fri, Mar 3, 2023 at 4:53 PM Andreas Gruenbacher <agrue...@redhat.com> > wrote: > > diff --git a/dlm_controld/plock.c b/dlm_controld/plock.c > > index 39bdd1f6..588bcaaa 100644 > > --- a/dlm_controld/plock.c > > +++ b/dlm_controld/plock.c > > @@ -8,6 +8,7 @@ > > > > #include "dlm_daemon.h" > > #include <linux/dlm_plock.h> > > +#include <sys/sdt.h> > >
does this require an additional dependency? > > /* FIXME: remove this once everyone is using the version of > > * dlm_plock.h which defines it */ > > @@ -211,6 +212,11 @@ static uint64_t dt_usec(const struct timeval *start, > > const struct timeval *stop) > > static void plock_print_start_waiter(const struct lockspace *ls, > > struct lock_waiter *w) > > { > > + const struct dlm_plock_info *info = &w->info; > > + > > + DTRACE_PROBE7(dlm_controld, plock_wait_begin, info->number, w, > > info->start, > > + info->end, info->nodeid, info->pid, info->owner); > > + > > log_plock(ls, "state waiter start %llx %p %llx-%llx %d/%u/%llx", > > (unsigned long long)w->info.number, > > w, > > An additional question I have about those events is which information > to log. We need to be able to identify which inode the request is for > (info->number), the locking range (info->start and info->end), whether > it is read or write lock, and which context in the kernel the request > refers to (this seems to be info->owner, but I'm not entirely sure > about that). The pid may be interesting as well, but are w or w is always true for waiters, it's just stored there as the structure is used in other places. It means if it's not a trylock or not. When we have a waiter we always have no trylock, so w == 1. > info->nodeid really useful? We should try to avoid exposing yes it is useful. So far I see you need a nodeid/pid kombination to see which pid had the lock state for a specific time. nodeid will tell you on which node the pid runs on, the same pid _could_ occur twice, without nodeid you don't know which process in the cluster had the state. Either lock or in this case contention state. > unnecessary implementation details like the addresses of objects > inside dlm_controld. > I am using it as a unique handle (probably the nodeid needs to be considered here as well, because the same argument like pid just more unlikely) to look up the start trace/log to the end trace. I used a hash value of some values, but printing out those values twice will just blow up the trace/log size. Or? - Alex