On 3/31/2026 12:53 PM, Mathieu Poirier wrote:
> On Mon, Mar 30, 2026 at 01:43:03PM -0500, Shah, Tanmay wrote:
>> Hello,
>>
>> Thanks for the reviews. Please find my comments below:
>>
>> On 3/27/2026 2:58 PM, Mathieu Poirier wrote:
>>> On Tue, Mar 17, 2026 at 01:12:51PM -0700, Tanmay Shah wrote:
>>>> On AMD-Xilinx platforms cortex-A and cortex-R can be configured as
>>>> separate subsystems. In this case, both cores can boot independent of
>>>> each other. If Linux went through uncontrolled reboot during active
>>>> rpmsg communication, then during next boot it can find rpmsg virtio
>>>> status not in the reset state. In such case it is important to reset the
>>>> virtio status during attach callback and wait for sometime for the
>>>> remote to handle virtio driver reset.
>>>
>>> I understand the user case, but won't that situation happen regardless of
>>> whether cores operate sync or split mode?
>>>
>>
>> I want to make it clear that this is not same as Cortex-R cluster
>> configured as lockstep vs split mode.
>>
>> This is different configuration between Cortex-A cores and Cortex-R
>> cores. It is a firmware driver configuration of how it treats cortex-A
>> and Cortex-R subsystems.
>>
>> In the firmware driver, we can configure Cortex-A cluster as owner of
>> Cortex-R cluster, and in that case, if Cortex-A reboots, the firmware
>> will also reboot cortex-R cores. This policy makes Cortex-A as owner of
>> Cortex-R cores. In this configuraton this patch is not needed, because
>> if Cortex-A reboots then platform management firmware will also reboot
>> Cortex-R cores as well and vdev status flag will be always 0.
>>
>> In another configuration in the firmware driver, Cortex-R cores can be
>> independent of Cortex-A cores. This means, Cortex-A is not the owner of
>> the Cortex-R cores. Both are independent subsystem. Only in this
>> configuration, this patch applies because it is possible that Cortex-A
>> reboots while Cortex-R doesn't and Cortex-R still runs as it is.
>>
>> So only in the second type of configuration, this patch is needed when
>> COrtex-A running linux reboots and when driver probes and tries to
>> attach it can find that vdev flag is not reset. In the first
>> configuartion if linux reboots, then It's guranteed that vdev status
>> flag will always be in the reset state.
>>
>> If you prefer I can extend the commit message with all above details or
>> put as comment in the attach() callback. Let me know which do you prefer.
>
> Ok, that clarifies a lot of things. Please add the above as a comment in
> attach().
>
>>
>>>>
>>>> Signed-off-by: Tanmay Shah <[email protected]>
>>>> ---
>>>> drivers/remoteproc/xlnx_r5_remoteproc.c | 46 +++++++++++++++++++++++++
>>>> 1 file changed, 46 insertions(+)
>>>>
>>>> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c
>>>> b/drivers/remoteproc/xlnx_r5_remoteproc.c
>>>> index 50a9974f3202..f08806f13800 100644
>>>> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c
>>>> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
>>>> @@ -5,6 +5,7 @@
>>>> */
>>>>
>>>> #include <dt-bindings/power/xlnx-zynqmp-power.h>
>>>> +#include <linux/delay.h>
>>>> #include <linux/dma-mapping.h>
>>>> #include <linux/firmware/xlnx-zynqmp.h>
>>>> #include <linux/kernel.h>
>>>> @@ -29,6 +30,8 @@
>>>> #define RSC_TBL_XLNX_MAGIC ((uint32_t)'x' << 24 | (uint32_t)'a' <<
>>>> 16 | \
>>>> (uint32_t)'m' << 8 | (uint32_t)'p')
>>>>
>>>> +#define RPROC_ATTACH_TIMEOUT_US (100 * 1000)
>>>> +
>>>
>>> There are some time constant already defined, please use them.
>>
>> Ack.
>>
>>>
>>>> /*
>>>> * settings for RPU cluster mode which
>>>> * reflects possible values of xlnx,cluster-mode dt-property
>>>> @@ -865,6 +868,49 @@ static int zynqmp_r5_get_rsc_table_va(struct
>>>> zynqmp_r5_core *r5_core)
>>>>
>>>> static int zynqmp_r5_attach(struct rproc *rproc)
>>>> {
>>>> + struct device *dev = &rproc->dev;
>>>> + bool wait_for_remote = false;
>>>> + struct fw_rsc_vdev *rsc;
>>>> + struct fw_rsc_hdr *hdr;
>>>> + int i, offset, avail;
>>>> +
>>>> + if (!rproc->table_ptr)
>>>> + goto attach_success;
>>>> +
>>>> + for (i = 0; i < rproc->table_ptr->num; i++) {
>>>> + offset = rproc->table_ptr->offset[i];
>>>> + hdr = (void *)rproc->table_ptr + offset;
>>>> + avail = rproc->table_sz - offset - sizeof(*hdr);
>>>> + rsc = (void *)hdr + sizeof(*hdr);
>>>> +
>>>> + /* make sure table isn't truncated */
>>>> + if (avail < 0) {
>>>> + dev_err(dev, "rsc table is truncated\n");
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + if (hdr->type != RSC_VDEV)
>>>> + continue;
>>>> +
>>>> + /*
>>>> + * reset vdev status, in case previous run didn't leave it in
>>>> + * a clean state.
>>>> + */
>>>> + if (rsc->status) {
>>>> + rsc->status = 0;
>>>> + wait_for_remote = true;
>>>> + break;
>>>> + }
>>>> + }
>>>> +
>>>> + /* kick remote to notify about attach */
>>>> + rproc->ops->kick(rproc, 0);
>>>> +
>>>> + /* wait for sometime until remote is ready */
>>>> + if (wait_for_remote)
>>>> + usleep_range(100, RPROC_ATTACH_TIMEOUT_US);
>>>
>>> Instead of waiting, would it be possible to return -EPROBE_DEFER and let the
>>> driver core retry mechanic do it's work?
>>>
>>
>> It is not possible to do -EPROBE_DEFER, because attach() callback is not
>> called only during driver probe.
>>
>> It is also called during following command sequence:
>>
>> attach() -> detach() -> attach()
>>
>> During second attach() callback, we can't do -EPROBE_DEFER, as it's not
>> driver probe anymore. So I think will have to keep the usleep_range().
>
> Right, but in this case the Cortex-A did not go through an uncontrolled
> reboot,
> we know the state of the machine is sound. Do you see a scenario where it
> would
> not be the case?
>
Yes correct. We will hit this issue only during probe, after that as
long as detach() happens we are setting vdev status to 0.
Another problem with the -EPROBE_DEFER mechanism is that the time
between return from attach() and next attach() isn't fixed. after
deferring current probe, when next probe and attach() happens, we will
always find vdev status to 0, even if remote hasn't handled vdev reset
event. So we don't know if the remote has handled virtio reset flag
notification or not. Where 100ms fixed delay, gives fixed time to remote
to handle vdev reset event. If needed this delay can be increased later.
This brings up another question to make the solution more robust. Do we
have any standard way of handling such a situation? Like in other virtio
standards, can this situation happen where driver comes up and finds the
virtio device status not in the reset state? How do they handle it?
Also, is firmware required to restore the resource table to default or
initial resource table after getting the virtio reset notification? Any
standard decided for remtoeproc virtio devices around this?
Thanks,
Tanmay
>>
>>> Thanks,
>>> Mathieu
>>>
>>>> +
>>>> +attach_success:
>>>> dev_dbg(&rproc->dev, "rproc %d attached\n", rproc->index);
>>>>
>>>> return 0;
>>>>
>>>> base-commit: d4ef36fbd57e610d4c334123ce706a2a71187cae
>>>> --
>>>> 2.34.1
>>>>
>>