On 10/17/25 10:35 AM, Mathieu Poirier wrote:
On Thu, Oct 16, 2025 at 11:12:26AM -0500, Tanmay Shah wrote:
Hello,
Please find my comments below:
On 10/16/25 10:12 AM, Mathieu Poirier wrote:
Good morning,
On Thu, Oct 02, 2025 at 08:33:46AM -0700, Tanmay Shah wrote:
Current recovery operation does only virtio device reset, but do not
free and re-allocate all the resources. As third-party is booting the
remote processor during attach-detach, it is better to free and
re-allocate resoruces as resource table state might be unknown to linux
when remote processor boots and reports crash.
1) When referring to "third-party", should I assume boot loader?
Here, "third-party" could be a bootloader or another core in a heterogeneous
system. In my-case it is a platform management controller.
Ok
2) Function rproc_attach_recovery() calls __rproc_detach(), which in turn calls
rproc_reset_rsc_table_on_detach(). That function deals explicitly with the
resource table.
As per my understanding, rproc_reset_rsc_table_on_detach() will setup clean
resource table, that sets vring addresses to 0xffffffff. Please let me know
if this understanding is not correct.
If we do not, call rproc_attach(), then correct vring addresses are not
setup in the resource table for next attach to work. Because,
rproc_handle_resources() and rproc_alloc_registered_carveouts() are not
called as part __rproc_attach().
Your assessment is correct. When the clean_table was introduced, it was to
address the detach->attach scenario. At that time the only recovery we
supported was to stop and start again, which did not involved the clean_table.
Re-attaching on crash was introduced later in a scenario that may not have
included a resource table.
Okay that explains the current architecture.
3) The code in this patch mixes __rproc_detach() with rproc_attach(), something
that is likely not a good idea. We either do __rproc_detach/__rproc_attach or
rproc_detach/rproc_attach but I'd like to avoid the mix-and-match to keep the
amount of possible states to a minimum.
I agree to this. I can find a way to call rproc_detach() and rproc_attach()
sequentially, instead of __rproc_detach() and rproc_attach() calls. I might
have to remove rproc_trigger_attach_recovery completely, but that is
implementation details. We can work it out later, once we agree to the
current problem & solution.
Humm... You might just be able to call rproc_detach/rproc_attach from
rproc_attach_recovery() if you enhance rproc_detach to be called in a CRASHED
context [1]. Let's see what you find when trying this on real HW.
[1].
https://elixir.bootlin.com/linux/v6.17.1/source/drivers/remoteproc/remoteproc_core.c#L2065
Thank You for the suggestion. Agreed. Since we are coming to an
agreement on the final solution, I will send the actual series which
also takes care of start/stop recovery on xlnx platform driver. I will
implement above suggestion, and test on HW.
Thanks,
Tanmay
If I understand correctly, the main motivation for this patch is the management
of the resource table. But as noted in (2), this should be taken care of. Am I
missing some information?
The main motivation is to make the attach operation works during
attach_recovery(). The __rproc_detach() works as expected, but attach
doesn't work. After recovery, I am not able to strat RPMsg communication.
Please let me know if I am missing something.
Thanks,
Tanmay
Thanks,
Mathieu
Signed-off-by: Tanmay Shah <[email protected]>
---
Note: RFC patch for design discussion. Please do not merge.
drivers/remoteproc/remoteproc_core.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/remoteproc_core.c
b/drivers/remoteproc/remoteproc_core.c
index 825672100528..4971508bc5b2 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1786,7 +1786,20 @@ static int rproc_attach_recovery(struct rproc *rproc)
if (ret)
return ret;
- return __rproc_attach(rproc);
+ /* clean up all acquired resources */
+ rproc_resource_cleanup(rproc);
+
+ /* release HW resources if needed */
+ rproc_unprepare_device(rproc);
+
+ rproc_disable_iommu(rproc);
+
+ /* Free the copy of the resource table */
+ kfree(rproc->cached_table);
+ rproc->cached_table = NULL;
+ rproc->table_ptr = NULL;
+
+ return rproc_attach(rproc);
}
static int rproc_boot_recovery(struct rproc *rproc)
base-commit: 56d030ea3330ab737fe6c05f89d52f56208b07ac
--
2.34.1