On Thu, Oct 16, 2025 at 11:12:26AM -0500, Tanmay Shah wrote: > > Hello, > > Please find my comments below: > > On 10/16/25 10:12 AM, Mathieu Poirier wrote: > > Good morning, > > > > On Thu, Oct 02, 2025 at 08:33:46AM -0700, Tanmay Shah wrote: > > > Current recovery operation does only virtio device reset, but do not > > > free and re-allocate all the resources. As third-party is booting the > > > remote processor during attach-detach, it is better to free and > > > re-allocate resoruces as resource table state might be unknown to linux > > > when remote processor boots and reports crash. > > > > 1) When referring to "third-party", should I assume boot loader? > > Here, "third-party" could be a bootloader or another core in a heterogeneous > system. In my-case it is a platform management controller.
Ok > > > > 2) Function rproc_attach_recovery() calls __rproc_detach(), which in turn > > calls > > rproc_reset_rsc_table_on_detach(). That function deals explicitly with the > > resource table. > > As per my understanding, rproc_reset_rsc_table_on_detach() will setup clean > resource table, that sets vring addresses to 0xffffffff. Please let me know > if this understanding is not correct. > > If we do not, call rproc_attach(), then correct vring addresses are not > setup in the resource table for next attach to work. Because, > rproc_handle_resources() and rproc_alloc_registered_carveouts() are not > called as part __rproc_attach(). Your assessment is correct. When the clean_table was introduced, it was to address the detach->attach scenario. At that time the only recovery we supported was to stop and start again, which did not involved the clean_table. Re-attaching on crash was introduced later in a scenario that may not have included a resource table. > > > 3) The code in this patch mixes __rproc_detach() with rproc_attach(), > > something > > that is likely not a good idea. We either do __rproc_detach/__rproc_attach > > or > > rproc_detach/rproc_attach but I'd like to avoid the mix-and-match to keep > > the > > amount of possible states to a minimum. > > > > I agree to this. I can find a way to call rproc_detach() and rproc_attach() > sequentially, instead of __rproc_detach() and rproc_attach() calls. I might > have to remove rproc_trigger_attach_recovery completely, but that is > implementation details. We can work it out later, once we agree to the > current problem & solution. > Humm... You might just be able to call rproc_detach/rproc_attach from rproc_attach_recovery() if you enhance rproc_detach to be called in a CRASHED context [1]. Let's see what you find when trying this on real HW. [1]. https://elixir.bootlin.com/linux/v6.17.1/source/drivers/remoteproc/remoteproc_core.c#L2065 > > If I understand correctly, the main motivation for this patch is the > > management > > of the resource table. But as noted in (2), this should be taken care of. > > Am I > > missing some information? > > > > The main motivation is to make the attach operation works during > attach_recovery(). The __rproc_detach() works as expected, but attach > doesn't work. After recovery, I am not able to strat RPMsg communication. > > Please let me know if I am missing something. > > Thanks, > Tanmay > > > Thanks, > > Mathieu > > > > > > > > Signed-off-by: Tanmay Shah <[email protected]> > > > --- > > > > > > Note: RFC patch for design discussion. Please do not merge. > > > > > > drivers/remoteproc/remoteproc_core.c | 15 ++++++++++++++- > > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/remoteproc/remoteproc_core.c > > > b/drivers/remoteproc/remoteproc_core.c > > > index 825672100528..4971508bc5b2 100644 > > > --- a/drivers/remoteproc/remoteproc_core.c > > > +++ b/drivers/remoteproc/remoteproc_core.c > > > @@ -1786,7 +1786,20 @@ static int rproc_attach_recovery(struct rproc > > > *rproc) > > > if (ret) > > > return ret; > > > - return __rproc_attach(rproc); > > > + /* clean up all acquired resources */ > > > + rproc_resource_cleanup(rproc); > > > + > > > + /* release HW resources if needed */ > > > + rproc_unprepare_device(rproc); > > > + > > > + rproc_disable_iommu(rproc); > > > + > > > + /* Free the copy of the resource table */ > > > + kfree(rproc->cached_table); > > > + rproc->cached_table = NULL; > > > + rproc->table_ptr = NULL; > > > + > > > + return rproc_attach(rproc); > > > } > > > static int rproc_boot_recovery(struct rproc *rproc) > > > > > > base-commit: 56d030ea3330ab737fe6c05f89d52f56208b07ac > > > -- > > > 2.34.1 > > > >

