On 3/19/2026 1:23 PM, Dmitry Baryshkov wrote:
On Thu, Mar 19, 2026 at 12:36:15PM +0800, Jingyi Wang wrote:
On 3/13/2026 10:37 AM, Dmitry Baryshkov wrote:
On Wed, Mar 11, 2026 at 01:39:50AM -0700, Bartosz Golaszewski wrote:
On Wed, 11 Mar 2026 03:11:42 +0100, Dmitry Baryshkov
<[email protected]> said:
On Tue, Mar 10, 2026 at 06:50:30AM -0700, Bartosz Golaszewski wrote:
Ideally things like this would be passed to the rproc core in some kind of a
config structure and only set when registration succeeds. This looks to me
like papering over the real issue and I think it's still racy as there's no
true synchronization.
Wouldn't it be better to take rproc->lock for the entire duration of
rproc_add()? It's already initialized in rproc_alloc().
It would still be racy as rproc_trigger_recovery() is called outside of
the lock. Instead the error cleanup path (and BTW, rproc_del() path too)
must explicitly call cancel_work_sync() on the crash_handler work (and
any other work items that can be scheduled).
This looks weird TBH. For example: rproc_crash_handler_work() takes the lock,
but releases it right before calling inspecting rproc->recovery_disabled and
calling rproc_trigger_recovery(). It looks wrong, I think it should keep the
lock and rptoc_trigger_recovery() should enforce it being taken before the
call.
Yes. Nevertheless the driver should cancel the work too.
Hi Dmitry & Bartosz,
rproc_crash_handler_work() may call rproc_trigger_recovery() and
rproc_add() may call rproc_boot(), both the function have already
hold the lock. And the lock cannot protect resources like glink_subdev
in the patch.
And there is a possible case for cancel_work, rproc_add tear down call
cancel work and wait for the work finished, the reboot run successfully,
and the tear down continued and the resources all released, including sysfs
and glink_subdev.
Indeed recovery_disabled is kind of hacky.
The root cause for this issue is that for remoteproc with RPROC_OFFLINE
state, the rproc_start will be called asynchronously, but for the remoteproc
with RPROC_DETACHED, the attach function is called directly, the failure
in this path will cause the rproc_add() fail and the resource release.
I think the current patch can be dropped, we are thinking about make
rproc_attach
called asynchronously to avoid this race.
Isn't this patch necessary for SoCCP bringup? If not, why did you
include it into the series?
yes, will squash to soccp patch in next versoin.
Thanks,
Jingyi