Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq

Mathieu Poirier Wed, 17 Dec 2025 13:35:00 -0800

On Wed, Dec 17, 2025 at 11:27:44AM +0100, Stefan Roese wrote:
> Hi Mathieu,
> 
> On 12/16/25 22:47, Mathieu Poirier wrote:
> > On Tue, Dec 16, 2025 at 03:34:18PM +0100, Stefan Roese wrote:
> > > Hi Mathieu,
> > > 
> > > On 12/15/25 02:14, Mathieu Poirier wrote:
> > > > On Wed, Dec 10, 2025 at 12:28:52PM -0600, Tanmay Shah wrote:
> > > > > Hello, please check my comments below:
> > > > > 
> > > > > On 12/10/25 2:29 AM, Stefan Roese wrote:
> > > > > > Hi Tanmay,
> > > > > > 
> > > > > > On 12/10/25 03:51, Zhongqiu Han wrote:
> > > > > > > On 12/5/2025 8:06 PM, Stefan Roese wrote:
> > > > > > > > Hi Tanmay,
> > > > > > > > 
> > > > > > > > On 12/4/25 17:45, Tanmay Shah wrote:
> > > > > > > > > Hello,
> > > > > > > > > 
> > > > > > > > > Thank You for your patch. Please find my comments below.
> > > > > > > > > 
> > > > > > > > > On 12/4/25 4:40 AM, Stefan Roese wrote:
> > > > > > > > > > Testing on our ZynqMP platform has shown, that some R5 
> > > > > > > > > > messages might
> > > > > > > > > > get dropped under high CPU load. This patch creates a new 
> > > > > > > > > > high-prio
> > > > > > > > > 
> > > > > 
> > > > > This commit text should be fixed. Messages are not dropped by Linux, 
> > > > > but R5
> > > > > can't send new messages as rx vq is not processed by Linux.
> > > > > 
> > > > 
> > > > I agree.
> > > > > > > > > Here, I would like to understand what it means by "R5
> > > > > > > > > messages might get dropped"
> > > > > > > > > 
> > > > > > > > > Even under high CPU load, the messages from R5 are stored in
> > > > > > > > > the virtqueues. If Linux doesn't read it, then it is not
> > > > > > > > > really lost/ dropped.
> > > > > > > > > 
> > > > > > > > > Could you please explain your use case in detail and how the
> > > > > > > > > testing is conducted?
> > > > > > > > 
> > > > > > > > Our use-case is, that we send ~4k messages per second from the 
> > > > > > > > R5 to
> > > > > > > > Linux - sometimes even a bit more. Normally these messages are 
> > > > > > > > received
> > > > > > > > okay and no messages are dropped. Sometimes, under "high CPU 
> > > > > > > > load"
> > > > > > > > scenarios it happens, that the R5 has to drop messages, as 
> > > > > > > > there is no
> > > > > > > > free space in the RPMsg buffer, which is 256 entries AFAIU. 
> > > > > > > > Resulting
> > > > > > > > from the Linux driver not emptying the RX queue.
> > > > > > > > 
> > > > > 
> > > > > Thanks for the details. Your understanding is correct.
> > > > > 
> > > > > > > > Could you please elaborate on these virtqueues a bit? 
> > > > > > > > Especially why no
> > > > > > > > messages drop should happen because of these virtqueues?
> > > > > > > 
> > > > > > > AFAIK, as a transport layer based on virtqueue, rpmsg is reliable 
> > > > > > > once a
> > > > > > > message has been successfully enqueued. The observed "drop" here 
> > > > > > > appears
> > > > > > > to be on the R5 side, where the application discards messages 
> > > > > > > when no
> > > > > > > entry buffer is available.
> > > > > > 
> > > > > > Correct.
> > > > > > 
> > > > > > > In the long run, while improving the Linux side is recommended,
> > > > > > 
> > > > > > Yes, please.
> > > > > > 
> > > > > > > it could
> > > > > > > also be helpful for the R5 side to implement strategies such as an
> > > > > > > application-level buffer and retry mechanisms.
> > > > > > 
> > > > > > We already did this. We've added an additional buffer mechanism to 
> > > > > > the
> > > > > > R5, which improved this "message drop situation" a bit. Still it 
> > > > > > did not
> > > > > > fix it for all our high message rate situations - still resulting in
> > > > > > frame drops on the R5 side (the R5 is a bit resource restricted).
> > > > > > 
> > > > > > Improving the responsiveness on the Linux side seems to be the best 
> > > > > > way
> > > > > > for us to deal with this problem.
> > > > > > 
> > > > > 
> > > > > I agree to this. However, Just want to understand and cover full 
> > > > > picture
> > > > > here.
> > > > > 
> > > > > On R5 side, I am assuming open-amp library is used for the RPMsg
> > > > > communication.
> > > > > 
> > > > > rpmsg_send() API will end up here: 
> > > > > https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/rpmsg/rpmsg_virtio.c#L384
> > > > > 
> > > > > Here, if the new buffer is not available, then R5 is supposed to wait 
> > > > > for
> > > > > 1ms before sending a new message. After 1ms, R5 will try to get buffer
> > > > > again, and this continues for 15 seconds. This is the default 
> > > > > mechanism.
> > > > > 
> > > > > This mechanism is used in your case correctly ?
> > > > > 
> > > > > Alternatively you can register platform specific wait mechanism via 
> > > > > this
> > > > > callback: 
> > > > > https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/include/openamp/rpmsg_virtio.h#L42
> > > > > 
> > > > > Few questions for further understanding:
> > > > > 
> > > > > 1) As per your use case, 4k per second data transfer rate must be 
> > > > > maintained
> > > > > all the time? And this is achieved with this patch?
> > > > > 
> > > > > Even after having the high priority queue, if someone wants to 
> > > > > achieve 8k
> > > > > per seconds or 16k per seconds data transfer rate, at some point we 
> > > > > will hit
> > > > > this issue again.
> > > > > 
> > > > 
> > > > Right, I also think this patch is not the right solution.
> > > 
> > > Hmmm. My understanding of Tanmays's comments is somewhat different. He
> > > is not "against" this patch in general AFAIU. Please see my reply with
> > > a more detailed description of our system setup and it's message flow
> > > and limitations that I just sent a few minutes ago.
> > > 
> > 
> > Regardless of how we spin things around, this patch is about running out of
> > resource (CPU cycles and memory).  It is only a matter of time before this
> > solution becomes obsolete.
> > 
> > The main issue here is that we are adding a priority workqueue for everyone
> > using this driver, which may have unwanted side effects.  Please add a 
> > kernel
> > module parameter to control what kind of workqueue is to be used.
> 
> Okay, will do.


Please see this patchset [1] Tanmay is currently working on.  I would much
rather see that solution put to work than playing with workqueue priorities.

[1]. "[RFC PATCH 0/2] Enhance RPMsg buffer management"

> 
> Thanks,
> Stefan
>

Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq

Reply via email to