On Tue, Dec 16, 2025 at 03:34:18PM +0100, Stefan Roese wrote: > Hi Mathieu, > > On 12/15/25 02:14, Mathieu Poirier wrote: > > On Wed, Dec 10, 2025 at 12:28:52PM -0600, Tanmay Shah wrote: > > > Hello, please check my comments below: > > > > > > On 12/10/25 2:29 AM, Stefan Roese wrote: > > > > Hi Tanmay, > > > > > > > > On 12/10/25 03:51, Zhongqiu Han wrote: > > > > > On 12/5/2025 8:06 PM, Stefan Roese wrote: > > > > > > Hi Tanmay, > > > > > > > > > > > > On 12/4/25 17:45, Tanmay Shah wrote: > > > > > > > Hello, > > > > > > > > > > > > > > Thank You for your patch. Please find my comments below. > > > > > > > > > > > > > > On 12/4/25 4:40 AM, Stefan Roese wrote: > > > > > > > > Testing on our ZynqMP platform has shown, that some R5 messages > > > > > > > > might > > > > > > > > get dropped under high CPU load. This patch creates a new > > > > > > > > high-prio > > > > > > > > > > > > > This commit text should be fixed. Messages are not dropped by Linux, but > > > R5 > > > can't send new messages as rx vq is not processed by Linux. > > > > > > > I agree. > > > > > > > Here, I would like to understand what it means by "R5 > > > > > > > messages might get dropped" > > > > > > > > > > > > > > Even under high CPU load, the messages from R5 are stored in > > > > > > > the virtqueues. If Linux doesn't read it, then it is not > > > > > > > really lost/ dropped. > > > > > > > > > > > > > > Could you please explain your use case in detail and how the > > > > > > > testing is conducted? > > > > > > > > > > > > Our use-case is, that we send ~4k messages per second from the R5 to > > > > > > Linux - sometimes even a bit more. Normally these messages are > > > > > > received > > > > > > okay and no messages are dropped. Sometimes, under "high CPU load" > > > > > > scenarios it happens, that the R5 has to drop messages, as there is > > > > > > no > > > > > > free space in the RPMsg buffer, which is 256 entries AFAIU. > > > > > > Resulting > > > > > > from the Linux driver not emptying the RX queue. > > > > > > > > > > > > Thanks for the details. Your understanding is correct. > > > > > > > > > Could you please elaborate on these virtqueues a bit? Especially > > > > > > why no > > > > > > messages drop should happen because of these virtqueues? > > > > > > > > > > AFAIK, as a transport layer based on virtqueue, rpmsg is reliable > > > > > once a > > > > > message has been successfully enqueued. The observed "drop" here > > > > > appears > > > > > to be on the R5 side, where the application discards messages when no > > > > > entry buffer is available. > > > > > > > > Correct. > > > > > > > > > In the long run, while improving the Linux side is recommended, > > > > > > > > Yes, please. > > > > > > > > > it could > > > > > also be helpful for the R5 side to implement strategies such as an > > > > > application-level buffer and retry mechanisms. > > > > > > > > We already did this. We've added an additional buffer mechanism to the > > > > R5, which improved this "message drop situation" a bit. Still it did not > > > > fix it for all our high message rate situations - still resulting in > > > > frame drops on the R5 side (the R5 is a bit resource restricted). > > > > > > > > Improving the responsiveness on the Linux side seems to be the best way > > > > for us to deal with this problem. > > > > > > > > > > I agree to this. However, Just want to understand and cover full picture > > > here. > > > > > > On R5 side, I am assuming open-amp library is used for the RPMsg > > > communication. > > > > > > rpmsg_send() API will end up here: > > > https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/rpmsg/rpmsg_virtio.c#L384 > > > > > > Here, if the new buffer is not available, then R5 is supposed to wait for > > > 1ms before sending a new message. After 1ms, R5 will try to get buffer > > > again, and this continues for 15 seconds. This is the default mechanism. > > > > > > This mechanism is used in your case correctly ? > > > > > > Alternatively you can register platform specific wait mechanism via this > > > callback: > > > https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/include/openamp/rpmsg_virtio.h#L42 > > > > > > Few questions for further understanding: > > > > > > 1) As per your use case, 4k per second data transfer rate must be > > > maintained > > > all the time? And this is achieved with this patch? > > > > > > Even after having the high priority queue, if someone wants to achieve 8k > > > per seconds or 16k per seconds data transfer rate, at some point we will > > > hit > > > this issue again. > > > > > > > Right, I also think this patch is not the right solution. > > Hmmm. My understanding of Tanmays's comments is somewhat different. He > is not "against" this patch in general AFAIU. Please see my reply with > a more detailed description of our system setup and it's message flow > and limitations that I just sent a few minutes ago. >
Regardless of how we spin things around, this patch is about running out of resource (CPU cycles and memory). It is only a matter of time before this solution becomes obsolete. The main issue here is that we are adding a priority workqueue for everyone using this driver, which may have unwanted side effects. Please add a kernel module parameter to control what kind of workqueue is to be used. Thanks, Mathieu > > > The reliable solution would be to keep the data transfer rate reasonable, > > > and have solid re-try mechanism. > > > > > > I am okay to take this patch in after addressing comments below but, > > > please > > > make sure all above things are r5 side is working as well. > > > > Tanmay is correct on all front. > > Agreed. > > Thanks, > Stefan >
