On Tue, Jun 09, 2026 at 01:43:17PM +0200, Stephan Gerhold wrote: > On Tue, Jun 09, 2026 at 03:52:52PM +0530, Mukesh Ojha wrote: > > If a subdevice fails to stop, it indicates broken communication with the > > DSP. Continuing to stop further subdevices against an unresponsive > > remote processor could close rpmsg devices that could remove the memory > > mapping from HLOS and in case if remote processor touches those memory > > can result in SMMU fault. > > > > Change rproc_stop_subdevices() to return int and abort on the first > > failing subdev. Propagate the error through rproc_stop() and > > __rproc_detach() so callers are aware the teardown did not complete > > cleanly. > > > > Signed-off-by: Mukesh Ojha <[email protected]> > > But what would callers do about this? If you abort the teardown sequence > half-way through you now have an inconsistent half-stopped state that > neither a new call to stop() nor a new call to start() could recover > from. That doesn't sound much better than the SMMU fault. Or am I > missing something here?
SMMU fault result in device crash while other is non-functional remote processor. From Linux side, we do not know the state of remote processor when the timeout happens..cleaning the subdevices can result in the debug data being lost for hung remote processor. > > I would expect that we should either be able to tolerate the SMMU faults > with the resets involved in the remoteproc stop/start sequence, or that > DMA gets cancelled by the remoteproc stop sequence, before the buffers > are unmapped. Perhaps the order of our stop sequence is just wrong? Can > we unmap the buffers in the subdev unprepare() callback? IMO, Sequence of subdevice is fine glink-> sysmon-> ssr start ssr -> sysmon-> glink stop glink subdevice gets cleared due to which this issue happens.., it will not help as we are ignoring the timeout. > Thanks, > Stephan -- -Mukesh Ojha

