On Wed, May 05, 2021 at 04:47:50PM -0500, Scott Cheloha wrote:
> Hi,
> 
> On a hunch I added additional parameter checks to task_add(9) and
> task_del(9) and caught intel(4) doing something strange.
> 
> The patch is straightforward: check that the taskq pointer tq is not
> NULL.  In the current code we return early if a flag is set or cleared
> in the task w, in which case we don't catch bogus taskq inputs, which
> is why the machine boots fine without the extra checks.
> 
> The patch:
> 
> Index: kern_task.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_task.c,v
> retrieving revision 1.31
> diff -u -p -r1.31 kern_task.c
> --- kern_task.c       1 Aug 2020 08:40:20 -0000       1.31
> +++ kern_task.c       5 May 2021 21:29:08 -0000
> @@ -354,6 +354,9 @@ task_add(struct taskq *tq, struct task *
>  {
>       int rv = 0;
>  
> +     if (tq == NULL)
> +             panic("%s: NULL taskq", __func__);
> +
>       if (ISSET(w->t_flags, TASK_ONQUEUE))
>               return (0);
>  
> @@ -378,6 +381,9 @@ int
>  task_del(struct taskq *tq, struct task *w)
>  {
>       int rv = 0;
> +
> +     if (tq == NULL)
> +             panic("%s: NULL taskq", __func__);
>  
>       if (!ISSET(w->t_flags, TASK_ONQUEUE))
>               return (0);
> 
> And here is the panic on my machine.  I had to reconstruct it from
> OCR, the machine has no serial port, sorry if there are typos.

boot crash can be helpful for such machines

> 
> panic: task_del: NULL taskq
> Stopped at db_enter+0xa: popq %rbp
>     TID          PID  UID     PRFLAGS    PFLAGS  CPU  COMMAND
>  513524  44883    0     Ox14000     0x200    3  update
>  352928  82402    0     0x14000     0x200    2  cleaner
>  382195  66035    0     Ox14000     0x200    1  reaper
> ...
> db_enter() at db_enter+Oxa
> panic(ffffffff81db24fb) at panic+0x12f
> task del(0,ffff8000010633e0) at task_del+Oxa8
> edp_panel vdd_on(ffff800001063128) at edp_panel_vdd_on+0x6a
> intel_dp_aux_xfer(ffff800001063128, ffffffff82512a20,4, ffffffff82512400,2,0) 
> at intel_dp_aux_xfer+0x18b
> intel_dp_aux_transfer(ffff8000010631e8, ffffffff82512a88) at 
> intel_dp_aux_transfer+0x183
> drm_dp_dpcd_access(ffff8000010631e8,9,0,ffff80000106313a, 1) at 
> drm_dp_dpcd_access+Oxa9
> drm_dp_dpcd_read(ffff8000010631e8,0,ffff80000106313a, f) at 
> drm_dp_dpcd_read+0x61
> intel_dp_read_dpcd(ffff800001063128) at intel_dp_read_dpcd+0x45
> intel_dp_init_connector(ffff800001063000, ffff800001064000) at 
> intel_dp_init_connector+0x988
> intel_ddi_init(ffff800000272000,0) at intel_ddi_init+0x454
> intel_modeset_init(ffff800000272000) at intel_modeset_init+0x1c9f
> i915_driver_probe(ffff800000272000, ffffffff82052f98) at 
> i915_driver_probe+0x7df
> inteldrm_attachhook(ffff800000272000) at inteldrm_attachhook+0x46
> end trace frame: Oxffffffff82512700, count: 0
> 
> >From the backtrace, I gather the following:
> 
> edp_panel_vdd_on() calls clear_delayed_work() which is just a macro

cancel_delayed_work()

> that calls task_del().  And for whatever reason the taskq passed to
> task_del() is NULL.  Maybe there is a missing INIT_DELAYED_WORK() call
> somewhere prior to this point?

the call to that is in
INIT_DELAYED_WORK(&intel_dp->panel_vdd_work, edp_panel_vdd_work);
but tq doesn't get set until work is scheduled as there are interfaces
to pick a tq when scheduling work.

So perhaps you want something like this to catch the cases where work is
cancelled before it is scheduled.

Index: sys/dev/pci/drm/include/linux/workqueue.h
===================================================================
RCS file: /cvs/src/sys/dev/pci/drm/include/linux/workqueue.h,v
retrieving revision 1.4
diff -u -p -r1.4 workqueue.h
--- sys/dev/pci/drm/include/linux/workqueue.h   14 Feb 2021 03:42:55 -0000      
1.4
+++ sys/dev/pci/drm/include/linux/workqueue.h   6 May 2021 11:51:14 -0000
@@ -95,7 +95,8 @@ queue_work(struct workqueue_struct *wq, 
 static inline void
 cancel_work_sync(struct work_struct *work)
 {
-       task_del(work->tq, &work->task);
+       if (work->tq != NULL)
+               task_del(work->tq, &work->task);
 }
 
 #define work_pending(work)     task_pending(&(work)->task)
@@ -169,6 +170,8 @@ mod_delayed_work(struct workqueue_struct
 static inline bool
 cancel_delayed_work(struct delayed_work *dwork)
 {
+       if (dwork->tq == NULL)
+               return false;
        if (timeout_del(&dwork->to))
                return true;
        return task_del(dwork->tq, &dwork->work.task);
@@ -177,6 +180,8 @@ cancel_delayed_work(struct delayed_work 
 static inline bool
 cancel_delayed_work_sync(struct delayed_work *dwork)
 {
+       if (dwork->tq == NULL)
+               return false;
        if (timeout_del(&dwork->to))
                return true;
        return task_del(dwork->tq, &dwork->work.task);

Reply via email to