On Wed, May 05, 2021 at 04:47:50PM -0500, Scott Cheloha wrote:
> Hi,
>
> On a hunch I added additional parameter checks to task_add(9) and
> task_del(9) and caught intel(4) doing something strange.
>
> The patch is straightforward: check that the taskq pointer tq is not
> NULL. In the current code we return early if a flag is set or cleared
> in the task w, in which case we don't catch bogus taskq inputs, which
> is why the machine boots fine without the extra checks.
>
> The patch:
>
> Index: kern_task.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_task.c,v
> retrieving revision 1.31
> diff -u -p -r1.31 kern_task.c
> --- kern_task.c 1 Aug 2020 08:40:20 -0000 1.31
> +++ kern_task.c 5 May 2021 21:29:08 -0000
> @@ -354,6 +354,9 @@ task_add(struct taskq *tq, struct task *
> {
> int rv = 0;
>
> + if (tq == NULL)
> + panic("%s: NULL taskq", __func__);
> +
> if (ISSET(w->t_flags, TASK_ONQUEUE))
> return (0);
>
> @@ -378,6 +381,9 @@ int
> task_del(struct taskq *tq, struct task *w)
> {
> int rv = 0;
> +
> + if (tq == NULL)
> + panic("%s: NULL taskq", __func__);
>
> if (!ISSET(w->t_flags, TASK_ONQUEUE))
> return (0);
>
> And here is the panic on my machine. I had to reconstruct it from
> OCR, the machine has no serial port, sorry if there are typos.
boot crash can be helpful for such machines
>
> panic: task_del: NULL taskq
> Stopped at db_enter+0xa: popq %rbp
> TID PID UID PRFLAGS PFLAGS CPU COMMAND
> 513524 44883 0 Ox14000 0x200 3 update
> 352928 82402 0 0x14000 0x200 2 cleaner
> 382195 66035 0 Ox14000 0x200 1 reaper
> ...
> db_enter() at db_enter+Oxa
> panic(ffffffff81db24fb) at panic+0x12f
> task del(0,ffff8000010633e0) at task_del+Oxa8
> edp_panel vdd_on(ffff800001063128) at edp_panel_vdd_on+0x6a
> intel_dp_aux_xfer(ffff800001063128, ffffffff82512a20,4, ffffffff82512400,2,0)
> at intel_dp_aux_xfer+0x18b
> intel_dp_aux_transfer(ffff8000010631e8, ffffffff82512a88) at
> intel_dp_aux_transfer+0x183
> drm_dp_dpcd_access(ffff8000010631e8,9,0,ffff80000106313a, 1) at
> drm_dp_dpcd_access+Oxa9
> drm_dp_dpcd_read(ffff8000010631e8,0,ffff80000106313a, f) at
> drm_dp_dpcd_read+0x61
> intel_dp_read_dpcd(ffff800001063128) at intel_dp_read_dpcd+0x45
> intel_dp_init_connector(ffff800001063000, ffff800001064000) at
> intel_dp_init_connector+0x988
> intel_ddi_init(ffff800000272000,0) at intel_ddi_init+0x454
> intel_modeset_init(ffff800000272000) at intel_modeset_init+0x1c9f
> i915_driver_probe(ffff800000272000, ffffffff82052f98) at
> i915_driver_probe+0x7df
> inteldrm_attachhook(ffff800000272000) at inteldrm_attachhook+0x46
> end trace frame: Oxffffffff82512700, count: 0
>
> >From the backtrace, I gather the following:
>
> edp_panel_vdd_on() calls clear_delayed_work() which is just a macro
cancel_delayed_work()
> that calls task_del(). And for whatever reason the taskq passed to
> task_del() is NULL. Maybe there is a missing INIT_DELAYED_WORK() call
> somewhere prior to this point?
the call to that is in
INIT_DELAYED_WORK(&intel_dp->panel_vdd_work, edp_panel_vdd_work);
but tq doesn't get set until work is scheduled as there are interfaces
to pick a tq when scheduling work.
So perhaps you want something like this to catch the cases where work is
cancelled before it is scheduled.
Index: sys/dev/pci/drm/include/linux/workqueue.h
===================================================================
RCS file: /cvs/src/sys/dev/pci/drm/include/linux/workqueue.h,v
retrieving revision 1.4
diff -u -p -r1.4 workqueue.h
--- sys/dev/pci/drm/include/linux/workqueue.h 14 Feb 2021 03:42:55 -0000
1.4
+++ sys/dev/pci/drm/include/linux/workqueue.h 6 May 2021 11:51:14 -0000
@@ -95,7 +95,8 @@ queue_work(struct workqueue_struct *wq,
static inline void
cancel_work_sync(struct work_struct *work)
{
- task_del(work->tq, &work->task);
+ if (work->tq != NULL)
+ task_del(work->tq, &work->task);
}
#define work_pending(work) task_pending(&(work)->task)
@@ -169,6 +170,8 @@ mod_delayed_work(struct workqueue_struct
static inline bool
cancel_delayed_work(struct delayed_work *dwork)
{
+ if (dwork->tq == NULL)
+ return false;
if (timeout_del(&dwork->to))
return true;
return task_del(dwork->tq, &dwork->work.task);
@@ -177,6 +180,8 @@ cancel_delayed_work(struct delayed_work
static inline bool
cancel_delayed_work_sync(struct delayed_work *dwork)
{
+ if (dwork->tq == NULL)
+ return false;
if (timeout_del(&dwork->to))
return true;
return task_del(dwork->tq, &dwork->work.task);