On Thu, Jun 11, 2026 at 03:18:51PM +0530, Mukesh Ojha wrote: > On Tue, Jun 09, 2026 at 01:43:17PM +0200, Stephan Gerhold wrote: > > On Tue, Jun 09, 2026 at 03:52:52PM +0530, Mukesh Ojha wrote: > > > If a subdevice fails to stop, it indicates broken communication with the > > > DSP. Continuing to stop further subdevices against an unresponsive > > > remote processor could close rpmsg devices that could remove the memory > > > mapping from HLOS and in case if remote processor touches those memory > > > can result in SMMU fault. > > > > > > Change rproc_stop_subdevices() to return int and abort on the first > > > failing subdev. Propagate the error through rproc_stop() and > > > __rproc_detach() so callers are aware the teardown did not complete > > > cleanly. > > > > > > Signed-off-by: Mukesh Ojha <[email protected]> > > > > But what would callers do about this? If you abort the teardown sequence > > half-way through you now have an inconsistent half-stopped state that > > neither a new call to stop() nor a new call to start() could recover > > from. That doesn't sound much better than the SMMU fault. Or am I > > missing something here? > > SMMU fault result in device crash while other is non-functional remote > processor. From Linux side, we do not know the state of remote processor > when the timeout happens..cleaning the subdevices can result in the > debug data being lost for hung remote processor. >
Ok, but how do we go from here? Do we expect that the system would have some userspace monitoring daemon that would collect the debug data and then reboot the device to make the remoteproc work again? With these changes, I don't see how you would start the remoteproc again without fully rebooting the board. Calling start()/stop() on the subdevices again would lead to crashes because some of them are in started state and some of them are in stopped state and we don't even know which one is in which state. Thanks, Stephan

