On 4/1/2026 10:23 AM, Shah, Tanmay wrote: > > > On 3/31/2026 12:53 PM, Mathieu Poirier wrote: >> On Mon, Mar 30, 2026 at 01:43:03PM -0500, Shah, Tanmay wrote: >>> Hello, >>> >>> Thanks for the reviews. Please find my comments below: >>> >>> On 3/27/2026 2:58 PM, Mathieu Poirier wrote: >>>> On Tue, Mar 17, 2026 at 01:12:51PM -0700, Tanmay Shah wrote: >>>>> On AMD-Xilinx platforms cortex-A and cortex-R can be configured as >>>>> separate subsystems. In this case, both cores can boot independent of >>>>> each other. If Linux went through uncontrolled reboot during active >>>>> rpmsg communication, then during next boot it can find rpmsg virtio >>>>> status not in the reset state. In such case it is important to reset the >>>>> virtio status during attach callback and wait for sometime for the >>>>> remote to handle virtio driver reset. >>>> >>>> I understand the user case, but won't that situation happen regardless of >>>> whether cores operate sync or split mode? >>>> >>> >>> I want to make it clear that this is not same as Cortex-R cluster >>> configured as lockstep vs split mode. >>> >>> This is different configuration between Cortex-A cores and Cortex-R >>> cores. It is a firmware driver configuration of how it treats cortex-A >>> and Cortex-R subsystems. >>> >>> In the firmware driver, we can configure Cortex-A cluster as owner of >>> Cortex-R cluster, and in that case, if Cortex-A reboots, the firmware >>> will also reboot cortex-R cores. This policy makes Cortex-A as owner of >>> Cortex-R cores. In this configuraton this patch is not needed, because >>> if Cortex-A reboots then platform management firmware will also reboot >>> Cortex-R cores as well and vdev status flag will be always 0. >>> >>> In another configuration in the firmware driver, Cortex-R cores can be >>> independent of Cortex-A cores. This means, Cortex-A is not the owner of >>> the Cortex-R cores. Both are independent subsystem. Only in this >>> configuration, this patch applies because it is possible that Cortex-A >>> reboots while Cortex-R doesn't and Cortex-R still runs as it is. >>> >>> So only in the second type of configuration, this patch is needed when >>> COrtex-A running linux reboots and when driver probes and tries to >>> attach it can find that vdev flag is not reset. In the first >>> configuartion if linux reboots, then It's guranteed that vdev status >>> flag will always be in the reset state. >>> >>> If you prefer I can extend the commit message with all above details or >>> put as comment in the attach() callback. Let me know which do you prefer. >> >> Ok, that clarifies a lot of things. Please add the above as a comment in >> attach(). >> >>> >>>>> >>>>> Signed-off-by: Tanmay Shah <[email protected]> >>>>> --- >>>>> drivers/remoteproc/xlnx_r5_remoteproc.c | 46 +++++++++++++++++++++++++ >>>>> 1 file changed, 46 insertions(+) >>>>> >>>>> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c >>>>> b/drivers/remoteproc/xlnx_r5_remoteproc.c >>>>> index 50a9974f3202..f08806f13800 100644 >>>>> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c >>>>> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c >>>>> @@ -5,6 +5,7 @@ >>>>> */ >>>>> >>>>> #include <dt-bindings/power/xlnx-zynqmp-power.h> >>>>> +#include <linux/delay.h> >>>>> #include <linux/dma-mapping.h> >>>>> #include <linux/firmware/xlnx-zynqmp.h> >>>>> #include <linux/kernel.h> >>>>> @@ -29,6 +30,8 @@ >>>>> #define RSC_TBL_XLNX_MAGIC ((uint32_t)'x' << 24 | (uint32_t)'a' << >>>>> 16 | \ >>>>> (uint32_t)'m' << 8 | (uint32_t)'p') >>>>> >>>>> +#define RPROC_ATTACH_TIMEOUT_US (100 * 1000) >>>>> + >>>> >>>> There are some time constant already defined, please use them. >>> >>> Ack. >>> >>>> >>>>> /* >>>>> * settings for RPU cluster mode which >>>>> * reflects possible values of xlnx,cluster-mode dt-property >>>>> @@ -865,6 +868,49 @@ static int zynqmp_r5_get_rsc_table_va(struct >>>>> zynqmp_r5_core *r5_core) >>>>> >>>>> static int zynqmp_r5_attach(struct rproc *rproc) >>>>> { >>>>> + struct device *dev = &rproc->dev; >>>>> + bool wait_for_remote = false; >>>>> + struct fw_rsc_vdev *rsc; >>>>> + struct fw_rsc_hdr *hdr; >>>>> + int i, offset, avail; >>>>> + >>>>> + if (!rproc->table_ptr) >>>>> + goto attach_success; >>>>> + >>>>> + for (i = 0; i < rproc->table_ptr->num; i++) { >>>>> + offset = rproc->table_ptr->offset[i]; >>>>> + hdr = (void *)rproc->table_ptr + offset; >>>>> + avail = rproc->table_sz - offset - sizeof(*hdr); >>>>> + rsc = (void *)hdr + sizeof(*hdr); >>>>> + >>>>> + /* make sure table isn't truncated */ >>>>> + if (avail < 0) { >>>>> + dev_err(dev, "rsc table is truncated\n"); >>>>> + return -EINVAL; >>>>> + } >>>>> + >>>>> + if (hdr->type != RSC_VDEV) >>>>> + continue; >>>>> + >>>>> + /* >>>>> + * reset vdev status, in case previous run didn't leave it in >>>>> + * a clean state. >>>>> + */ >>>>> + if (rsc->status) { >>>>> + rsc->status = 0; >>>>> + wait_for_remote = true; >>>>> + break; >>>>> + } >>>>> + } >>>>> + >>>>> + /* kick remote to notify about attach */ >>>>> + rproc->ops->kick(rproc, 0); >>>>> + >>>>> + /* wait for sometime until remote is ready */ >>>>> + if (wait_for_remote) >>>>> + usleep_range(100, RPROC_ATTACH_TIMEOUT_US); >>>> >>>> Instead of waiting, would it be possible to return -EPROBE_DEFER and let >>>> the >>>> driver core retry mechanic do it's work? >>>> >>> >>> It is not possible to do -EPROBE_DEFER, because attach() callback is not >>> called only during driver probe. >>> >>> It is also called during following command sequence: >>> >>> attach() -> detach() -> attach() >>> >>> During second attach() callback, we can't do -EPROBE_DEFER, as it's not >>> driver probe anymore. So I think will have to keep the usleep_range(). >> >> Right, but in this case the Cortex-A did not go through an uncontrolled >> reboot, >> we know the state of the machine is sound. Do you see a scenario where it >> would >> not be the case? >> > > Yes correct. We will hit this issue only during probe, after that as > long as detach() happens we are setting vdev status to 0. > > Another problem with the -EPROBE_DEFER mechanism is that the time > between return from attach() and next attach() isn't fixed. after > deferring current probe, when next probe and attach() happens, we will > always find vdev status to 0, even if remote hasn't handled vdev reset > event. So we don't know if the remote has handled virtio reset flag > notification or not. Where 100ms fixed delay, gives fixed time to remote > to handle vdev reset event. If needed this delay can be increased later. > > This brings up another question to make the solution more robust. Do we > have any standard way of handling such a situation? Like in other virtio > standards, can this situation happen where driver comes up and finds the > virtio device status not in the reset state? How do they handle it? > > Also, is firmware required to restore the resource table to default or > initial resource table after getting the virtio reset notification? Any > standard decided for remtoeproc virtio devices around this? > > Thanks, > Tanmay > All, This issue was discussed on openamp-rp call, and it was concluded that the common solution that works for all vendors is needed for this problem. The design of the solution was discussed like this: 1) Resource table will have standard resource that will be used to ask device to reset itself. 2) Device will set the virtio status flag to 0 after successful reset. 3) Linux will poll non-zero virtio status until it is set to 0. If virtio status is not found 0, then driver probe will defer. I will implement above design and send new patch series. Please consider this patch series rejected. Thanks, Tanmay >>> >>>> Thanks, >>>> Mathieu >>>> >>>>> + >>>>> +attach_success: >>>>> dev_dbg(&rproc->dev, "rproc %d attached\n", rproc->index); >>>>> >>>>> return 0; >>>>> >>>>> base-commit: d4ef36fbd57e610d4c334123ce706a2a71187cae >>>>> -- >>>>> 2.34.1 >>>>> >>> >

