On Tue, 10 Mar 2026 11:03:21 +0100, Jingyi Wang <[email protected]> said: > rproc_add() called by rproc probe function failure will tear down all > the resources including do device_del() and remove subdev etc. If > rproc_report_crash() is called in this path, the rproc_crash_handler_work > could be excuted asynchronously, rproc_boot_recovery()->rproc_stop() will > be called with recovery enabled, which may cause NULL pointer dereference > if the resource has already been cleaned up. > > [ 5.251483] Unable to handle kernel NULL pointer dereference at virtual > address 0000000000000300 > [ 5.260499] Mem abort info: > [ 5.263384] ESR = 0x0000000096000006 > [ 5.267248] EC = 0x25: DABT (current EL), IL = 32 bits > [ 5.272711] SET = 0, FnV = 0 > [ 5.275865] EA = 0, S1PTW = 0 > [ 5.279106] FSC = 0x06: level 2 translation fault > [ 5.284125] Data abort info: > [ 5.287101] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 > [ 5.292742] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > [ 5.297939] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > [ 5.303400] user pgtable: 4k pages, 48-bit VAs, pgdp=000000089e086000 > [ 5.310022] [0000000000000300] pgd=080000089e087403, p4d=080000089e087403, > pud=080000089e088403, pmd=0000000000000000 > [ 5.320917] Internal error: Oops: 0000000096000006 [#1] SMP > [ 5.392494] Hardware name: Qualcomm Technologies, Inc. Kaanapali QRD (DT) > [ 5.399466] Workqueue: rproc_recovery_wq rproc_crash_handler_work > [ 5.405729] pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) > [ 5.412879] pc : qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] > [ 5.419674] lr : glink_subdev_stop+0x1c/0x30 [qcom_common] > [ 5.425308] sp : ffff800080ffbc90 > [ 5.428724] x29: ffff800080ffbc90 x28: ffff00081be833f0 x27: > ffff000800059c00 > [ 5.436053] x26: 0000000000000000 x25: ffff000800a56f80 x24: > 61c8864680b583eb > [ 5.443384] x23: ffff00081be83038 x22: 0000000000000001 x21: > ffff00081be83000 > [ 5.450714] x20: ffff00081be833c0 x19: 0000000000000000 x18: > 0000000000000010 > [ 5.458043] x17: 0000000000000000 x16: 0000000000000000 x15: > ffff0008042684f8 > [ 5.465374] x14: 00000000000002dd x13: ffff0008042684f8 x12: > ffffd37f69f967a0 > [ 5.472705] x11: ffffd37f6a006800 x10: ffffd37f69fee7c0 x9 : > ffffd37f69fee818 > [ 5.480036] x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : > 0000000000000001 > [ 5.487366] x5 : ffff000d6536d408 x4 : 0000000000000001 x3 : > 0000000000000000 > [ 5.494697] x2 : ffffd37f5703c18c x1 : 0000000000000001 x0 : > 0000000000000000 > [ 5.502028] Call trace: > [ 5.504549] qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] (P) > [ 5.511344] glink_subdev_stop+0x1c/0x30 [qcom_common] > [ 5.516622] rproc_stop+0x58/0x17c > [ 5.520127] rproc_trigger_recovery+0xb0/0x150 > [ 5.524693] rproc_crash_handler_work+0xa4/0xc4 > [ 5.529346] process_scheduled_works+0x18c/0x2d8 > [ 5.534092] worker_thread+0x144/0x280 > [ 5.537952] kthread+0x124/0x138 > [ 5.541280] ret_from_fork+0x10/0x20 > [ 5.544965] Code: a9be7bfd 910003fd a90153f3 aa0003f3 (b9430000) > [ 5.551224] ---[ end trace 0000000000000000 ]--- > > So set recovery_disabled during rproc_add(). > > Signed-off-by: Jingyi Wang <[email protected]> > --- > drivers/remoteproc/remoteproc_core.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/remoteproc/remoteproc_core.c > b/drivers/remoteproc/remoteproc_core.c > index b087ed21858a..f66dde712cec 100644 > --- a/drivers/remoteproc/remoteproc_core.c > +++ b/drivers/remoteproc/remoteproc_core.c > @@ -2286,7 +2286,10 @@ int rproc_add(struct rproc *rproc) > { > struct device *dev = &rproc->dev; > int ret; > + bool rproc_recovery_save; > > + rproc_recovery_save = rproc->recovery_disabled; > + rproc->recovery_disabled = true; > ret = rproc_validate(rproc); > if (ret < 0) > return ret; > @@ -2319,6 +2322,7 @@ int rproc_add(struct rproc *rproc) > list_add_rcu(&rproc->node, &rproc_list); > mutex_unlock(&rproc_list_mutex); > > + rproc->recovery_disabled = rproc_recovery_save; > return 0; > > rproc_remove_dev: > > -- > 2.25.1 > >
Ideally things like this would be passed to the rproc core in some kind of a config structure and only set when registration succeeds. This looks to me like papering over the real issue and I think it's still racy as there's no true synchronization. Wouldn't it be better to take rproc->lock for the entire duration of rproc_add()? It's already initialized in rproc_alloc(). Bart

