Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq

Mathieu Poirier Tue, 16 Dec 2025 13:48:00 -0800

On Tue, Dec 16, 2025 at 03:34:18PM +0100, Stefan Roese wrote:
> Hi Mathieu,
> 
> On 12/15/25 02:14, Mathieu Poirier wrote:
> > On Wed, Dec 10, 2025 at 12:28:52PM -0600, Tanmay Shah wrote:
> > > Hello, please check my comments below:
> > > 
> > > On 12/10/25 2:29 AM, Stefan Roese wrote:
> > > > Hi Tanmay,
> > > > 
> > > > On 12/10/25 03:51, Zhongqiu Han wrote:
> > > > > On 12/5/2025 8:06 PM, Stefan Roese wrote:
> > > > > > Hi Tanmay,
> > > > > > 
> > > > > > On 12/4/25 17:45, Tanmay Shah wrote:
> > > > > > > Hello,
> > > > > > > 
> > > > > > > Thank You for your patch. Please find my comments below.
> > > > > > > 
> > > > > > > On 12/4/25 4:40 AM, Stefan Roese wrote:
> > > > > > > > Testing on our ZynqMP platform has shown, that some R5 messages 
> > > > > > > > might
> > > > > > > > get dropped under high CPU load. This patch creates a new 
> > > > > > > > high-prio
> > > > > > > 
> > > 
> > > This commit text should be fixed. Messages are not dropped by Linux, but 
> > > R5
> > > can't send new messages as rx vq is not processed by Linux.
> > > 
> > 
> > I agree.
> > > > > > > Here, I would like to understand what it means by "R5
> > > > > > > messages might get dropped"
> > > > > > > 
> > > > > > > Even under high CPU load, the messages from R5 are stored in
> > > > > > > the virtqueues. If Linux doesn't read it, then it is not
> > > > > > > really lost/ dropped.
> > > > > > > 
> > > > > > > Could you please explain your use case in detail and how the
> > > > > > > testing is conducted?
> > > > > > 
> > > > > > Our use-case is, that we send ~4k messages per second from the R5 to
> > > > > > Linux - sometimes even a bit more. Normally these messages are 
> > > > > > received
> > > > > > okay and no messages are dropped. Sometimes, under "high CPU load"
> > > > > > scenarios it happens, that the R5 has to drop messages, as there is 
> > > > > > no
> > > > > > free space in the RPMsg buffer, which is 256 entries AFAIU. 
> > > > > > Resulting
> > > > > > from the Linux driver not emptying the RX queue.
> > > > > > 
> > > 
> > > Thanks for the details. Your understanding is correct.
> > > 
> > > > > > Could you please elaborate on these virtqueues a bit? Especially 
> > > > > > why no
> > > > > > messages drop should happen because of these virtqueues?
> > > > > 
> > > > > AFAIK, as a transport layer based on virtqueue, rpmsg is reliable 
> > > > > once a
> > > > > message has been successfully enqueued. The observed "drop" here 
> > > > > appears
> > > > > to be on the R5 side, where the application discards messages when no
> > > > > entry buffer is available.
> > > > 
> > > > Correct.
> > > > 
> > > > > In the long run, while improving the Linux side is recommended,
> > > > 
> > > > Yes, please.
> > > > 
> > > > > it could
> > > > > also be helpful for the R5 side to implement strategies such as an
> > > > > application-level buffer and retry mechanisms.
> > > > 
> > > > We already did this. We've added an additional buffer mechanism to the
> > > > R5, which improved this "message drop situation" a bit. Still it did not
> > > > fix it for all our high message rate situations - still resulting in
> > > > frame drops on the R5 side (the R5 is a bit resource restricted).
> > > > 
> > > > Improving the responsiveness on the Linux side seems to be the best way
> > > > for us to deal with this problem.
> > > > 
> > > 
> > > I agree to this. However, Just want to understand and cover full picture
> > > here.
> > > 
> > > On R5 side, I am assuming open-amp library is used for the RPMsg
> > > communication.
> > > 
> > > rpmsg_send() API will end up here: 
> > > https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/rpmsg/rpmsg_virtio.c#L384
> > > 
> > > Here, if the new buffer is not available, then R5 is supposed to wait for
> > > 1ms before sending a new message. After 1ms, R5 will try to get buffer
> > > again, and this continues for 15 seconds. This is the default mechanism.
> > > 
> > > This mechanism is used in your case correctly ?
> > > 
> > > Alternatively you can register platform specific wait mechanism via this
> > > callback: 
> > > https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/include/openamp/rpmsg_virtio.h#L42
> > > 
> > > Few questions for further understanding:
> > > 
> > > 1) As per your use case, 4k per second data transfer rate must be 
> > > maintained
> > > all the time? And this is achieved with this patch?
> > > 
> > > Even after having the high priority queue, if someone wants to achieve 8k
> > > per seconds or 16k per seconds data transfer rate, at some point we will 
> > > hit
> > > this issue again.
> > > 
> > 
> > Right, I also think this patch is not the right solution.
> 
> Hmmm. My understanding of Tanmays's comments is somewhat different. He
> is not "against" this patch in general AFAIU. Please see my reply with
> a more detailed description of our system setup and it's message flow
> and limitations that I just sent a few minutes ago.
>


Regardless of how we spin things around, this patch is about running out of
resource (CPU cycles and memory).  It is only a matter of time before this
solution becomes obsolete.

The main issue here is that we are adding a priority workqueue for everyone
using this driver, which may have unwanted side effects.  Please add a kernel
module parameter to control what kind of workqueue is to be used.

Thanks,
Mathieu  
 
> > > The reliable solution would be to keep the data transfer rate reasonable,
> > > and have solid re-try mechanism.
> > > 
> > > I am okay to take this patch in after addressing comments below but, 
> > > please
> > > make sure all above things are r5 side is working as well.
> > 
> > Tanmay is correct on all front.
> 
> Agreed.
> 
> Thanks,
> Stefan
>

Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq

Reply via email to