> Date: Thu, 6 May 2021 21:59:12 +1000
> From: Jonathan Gray <[email protected]>
>
> On Wed, May 05, 2021 at 04:47:50PM -0500, Scott Cheloha wrote:
> > Hi,
> >
> > On a hunch I added additional parameter checks to task_add(9) and
> > task_del(9) and caught intel(4) doing something strange.
> >
> > The patch is straightforward: check that the taskq pointer tq is not
> > NULL. In the current code we return early if a flag is set or cleared
> > in the task w, in which case we don't catch bogus taskq inputs, which
> > is why the machine boots fine without the extra checks.
> >
> > The patch:
> >
> > Index: kern_task.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_task.c,v
> > retrieving revision 1.31
> > diff -u -p -r1.31 kern_task.c
> > --- kern_task.c 1 Aug 2020 08:40:20 -0000 1.31
> > +++ kern_task.c 5 May 2021 21:29:08 -0000
> > @@ -354,6 +354,9 @@ task_add(struct taskq *tq, struct task *
> > {
> > int rv = 0;
> >
> > + if (tq == NULL)
> > + panic("%s: NULL taskq", __func__);
> > +
> > if (ISSET(w->t_flags, TASK_ONQUEUE))
> > return (0);
> >
> > @@ -378,6 +381,9 @@ int
> > task_del(struct taskq *tq, struct task *w)
> > {
> > int rv = 0;
> > +
> > + if (tq == NULL)
> > + panic("%s: NULL taskq", __func__);
> >
> > if (!ISSET(w->t_flags, TASK_ONQUEUE))
> > return (0);
> >
> > And here is the panic on my machine. I had to reconstruct it from
> > OCR, the machine has no serial port, sorry if there are typos.
>
> boot crash can be helpful for such machines
>
> >
> > panic: task_del: NULL taskq
> > Stopped at db_enter+0xa: popq %rbp
> > TID PID UID PRFLAGS PFLAGS CPU COMMAND
> > 513524 44883 0 Ox14000 0x200 3 update
> > 352928 82402 0 0x14000 0x200 2 cleaner
> > 382195 66035 0 Ox14000 0x200 1 reaper
> > ...
> > db_enter() at db_enter+Oxa
> > panic(ffffffff81db24fb) at panic+0x12f
> > task del(0,ffff8000010633e0) at task_del+Oxa8
> > edp_panel vdd_on(ffff800001063128) at edp_panel_vdd_on+0x6a
> > intel_dp_aux_xfer(ffff800001063128, ffffffff82512a20,4,
> > ffffffff82512400,2,0) at intel_dp_aux_xfer+0x18b
> > intel_dp_aux_transfer(ffff8000010631e8, ffffffff82512a88) at
> > intel_dp_aux_transfer+0x183
> > drm_dp_dpcd_access(ffff8000010631e8,9,0,ffff80000106313a, 1) at
> > drm_dp_dpcd_access+Oxa9
> > drm_dp_dpcd_read(ffff8000010631e8,0,ffff80000106313a, f) at
> > drm_dp_dpcd_read+0x61
> > intel_dp_read_dpcd(ffff800001063128) at intel_dp_read_dpcd+0x45
> > intel_dp_init_connector(ffff800001063000, ffff800001064000) at
> > intel_dp_init_connector+0x988
> > intel_ddi_init(ffff800000272000,0) at intel_ddi_init+0x454
> > intel_modeset_init(ffff800000272000) at intel_modeset_init+0x1c9f
> > i915_driver_probe(ffff800000272000, ffffffff82052f98) at
> > i915_driver_probe+0x7df
> > inteldrm_attachhook(ffff800000272000) at inteldrm_attachhook+0x46
> > end trace frame: Oxffffffff82512700, count: 0
> >
> > >From the backtrace, I gather the following:
> >
> > edp_panel_vdd_on() calls clear_delayed_work() which is just a macro
>
> cancel_delayed_work()
>
> > that calls task_del(). And for whatever reason the taskq passed to
> > task_del() is NULL. Maybe there is a missing INIT_DELAYED_WORK() call
> > somewhere prior to this point?
>
> the call to that is in
> INIT_DELAYED_WORK(&intel_dp->panel_vdd_work, edp_panel_vdd_work);
> but tq doesn't get set until work is scheduled as there are interfaces
> to pick a tq when scheduling work.
>
> So perhaps you want something like this to catch the cases where work is
> cancelled before it is scheduled.
Yeah, I think that makes sense.
ok kettenis@
> Index: sys/dev/pci/drm/include/linux/workqueue.h
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/drm/include/linux/workqueue.h,v
> retrieving revision 1.4
> diff -u -p -r1.4 workqueue.h
> --- sys/dev/pci/drm/include/linux/workqueue.h 14 Feb 2021 03:42:55 -0000
> 1.4
> +++ sys/dev/pci/drm/include/linux/workqueue.h 6 May 2021 11:51:14 -0000
> @@ -95,7 +95,8 @@ queue_work(struct workqueue_struct *wq,
> static inline void
> cancel_work_sync(struct work_struct *work)
> {
> - task_del(work->tq, &work->task);
> + if (work->tq != NULL)
> + task_del(work->tq, &work->task);
> }
>
> #define work_pending(work) task_pending(&(work)->task)
> @@ -169,6 +170,8 @@ mod_delayed_work(struct workqueue_struct
> static inline bool
> cancel_delayed_work(struct delayed_work *dwork)
> {
> + if (dwork->tq == NULL)
> + return false;
> if (timeout_del(&dwork->to))
> return true;
> return task_del(dwork->tq, &dwork->work.task);
> @@ -177,6 +180,8 @@ cancel_delayed_work(struct delayed_work
> static inline bool
> cancel_delayed_work_sync(struct delayed_work *dwork)
> {
> + if (dwork->tq == NULL)
> + return false;
> if (timeout_del(&dwork->to))
> return true;
> return task_del(dwork->tq, &dwork->work.task);
>
>