On 12 Mar 2015, at 20:59, Christos Zoulas <[email protected]> wrote:
> | > | Now we have a deadlock, softlck/0 waits for the mutex and therefore > | > | callouts will no longer be processed and ciss holds the mutex and waits > | > | for a callout through cv_timedwait. > | > | > Thanks for looking into it! Part of the ciss_ioctl_vol() (the pdid part) > | > does things with XS_CTL_POLL so that it does not involve any mutexes. It > | > would be simple to change the ldid part to do the same. Should we do that? The mutex involved is the sme_mtx protecting the struct sysmon_envsys, so our problem doesn't come from missing POLL. > | > | - Sleeping up to 60 seconds in a function used by a callout is wrong. > | > > | > Yes, but many disk drivers seem to violate that. How do we fix this? > | > Making a separate thread that updates statistics for each driver seems > | > suboptimal? We already have it. If I understand sysmon right, it is already based on workqueues (the ciss0 thread here): The workqueue updates sensors every sme->sme_events_timeout seconds, default is 30 seconds. Workqueue items get enqueued from a callout. Both running a workqueue item and processing the callout locks the same mutex sme->sme_mtx. For this to work the workqueue must complete before the callout fires: sme->sme_nsensors * ccb->ccb_xs->timeout < sme->sme_events_timeout In our ciss case we could set: sc->sc_sme->sme_events_timeout = 30 ccb->ccb_xs->timeout= 20 / sc->maxunits to become safe. Hope I got this right so far ... -- J. Hannken-Illjes - [email protected] - TU Braunschweig (Germany)
