Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq

Stefan Roese Tue, 16 Dec 2025 06:33:33 -0800

Hi Tanmay,

sorry for the delay. Please find some comments below...


On 12/10/25 19:28, Tanmay Shah wrote:

Hello, please check my comments below:

On 12/10/25 2:29 AM, Stefan Roese wrote:

Hi Tanmay,

On 12/10/25 03:51, Zhongqiu Han wrote:

On 12/5/2025 8:06 PM, Stefan Roese wrote:

Hi Tanmay,

On 12/4/25 17:45, Tanmay Shah wrote:

Hello,

Thank You for your patch. Please find my comments below.

On 12/4/25 4:40 AM, Stefan Roese wrote:

Testing on our ZynqMP platform has shown, that some R5 messages might
get dropped under high CPU load. This patch creates a new high-prio

This commit text should be fixed. Messages are not dropped by Linux, butR5 can't send new messages as rx vq is not processed by Linux.


Agreed. I will change the commit message in the next patch revision.

Here, I would like to understand what it means by "R5 messagesmight get dropped"
Even under high CPU load, the messages from R5 are stored in thevirtqueues. If Linux doesn't read it, then it is not really lost/dropped.
Could you please explain your use case in detail and how thetesting is conducted?
Our use-case is, that we send ~4k messages per second from the R5 to
Linux - sometimes even a bit more. Normally these messages are received
okay and no messages are dropped. Sometimes, under "high CPU load"
scenarios it happens, that the R5 has to drop messages, as there is no
free space in the RPMsg buffer, which is 256 entries AFAIU. Resulting
from the Linux driver not emptying the RX queue.
Thanks for the details. Your understanding is correct.
Could you please elaborate on these virtqueues a bit? Especially why no
messages drop should happen because of these virtqueues?
AFAIK, as a transport layer based on virtqueue, rpmsg is reliable once a
message has been successfully enqueued. The observed "drop" here appears
to be on the R5 side, where the application discards messages when no
entry buffer is available.
Correct.
In the long run, while improving the Linux side is recommended,
Yes, please.
it could
also be helpful for the R5 side to implement strategies such as an
application-level buffer and retry mechanisms.
We already did this. We've added an additional buffer mechanism to the
R5, which improved this "message drop situation" a bit. Still it did not
fix it for all our high message rate situations - still resulting in
frame drops on the R5 side (the R5 is a bit resource restricted).

Improving the responsiveness on the Linux side seems to be the best way
for us to deal with this problem.
I agree to this. However, Just want to understand and cover full picturehere.
On R5 side, I am assuming open-amp library is used for the RPMsgcommunication.
rpmsg_send() API will end up here: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/rpmsg/rpmsg_virtio.c#L384
Here, if the new buffer is not available, then R5 is supposed to waitfor 1ms before sending a new message. After 1ms, R5 will try to getbuffer again, and this continues for 15 seconds. This is the defaultmechanism.
This mechanism is used in your case correctly ?


We use rpmsg_trysend() to send data (messages):
- that means we try to write a message to vq
- if it fails (queue full), we just add it to a software ringbuffer
  (and try to send it on the next cycle)
- we cannot wait for a message queue to get "not full", because data to
  write to rpmsg vq arrives cyclic each [ms] (so we cannot wait for
  rpmsg sending to be done)

Alternatively you can register platform specific wait mechanism via thiscallback: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/include/openamp/rpmsg_virtio.h#L42
Few questions for further understanding:
1) As per your use case, 4k per second data transfer rate must bemaintained all the time? And this is achieved with this patch?


Yes, the 4k messages / sec arrive from an external (sensor) system
which is then forwarded from r5 to a53. So therefore it means it has to
be maintained all the time, as we have no control over the external
sensor originating these messages.

Even after having the high priority queue, if someone wants to achieve8k per seconds or 16k per seconds data transfer rate, at some point wewill hit this issue again.


Agreed. This current "solution" by using a high-prio workqueue will
very likely not fix all use-cases - especially when the message rate
increases even more for a longer time. This is not to be expected in
our system though. We have run longer tests on our system w/o any
message drops (on the r5 side of course) with this patch applied.

The reliable solution would be to keep the data transfer ratereasonable, and have solid re-try mechanism.


AFAIU, we do have a "solid re-try mechanism" implemented with this
software ringbuffer that we added, as mentioned above. Still the
resources on the r5 side are somewhat limited and we can't increase
this ringbuffer size much more. Additionally we have some requirements
that the messages are received on the Linux a53 side not too much
delayed. IMHO this patch with "improved message receiving" in Linux
seems to be the best solution for us.

I am okay to take this patch in after addressing comments below but,please make sure all above things are r5 side is working as well.


Okay. Thanks for all your comments and input so far.

Thanks,
Stefan

Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq

Reply via email to