Re: [beagleboard] How to reliably push data from ARM host to PRU (shared) memory with predictable (low) latency?

ags Wed, 22 Mar 2017 22:14:11 -0700

You've hit the nail on the head. The issue (IMO) is Linux "wandering off 
into the weeds". It comes back, eventually... but while gone, bad things 
happen.

1) I am using a handshake approach between PRU and ARM, using interrupts. 
When the PRU wants more data, it generates an ARM interrupt. The user space 
application listens for the interrupt (using select()) and when received 
sends more data. The PRU is made aware of the data being ready by sending 
an interrupt to the PRU.
2) I am using a ring (though with only two compartments, it seems more like 
a "line") to send the data. I think of it as a tic/tock, or ping/pong 
approach: when one "side" (half) of the data space has been read by the 
PRU, it signals the ARM host to send another (half) buffer full of data. So 
the PRU is always reading from a buffer while the ARM is loading the other.
3) While the average data rate I need to sustain is about 13Mbit/second 
(not a problem) the challenge is ensuring, under all conditions, that I can 
send 262 kbits of data from ARM to PRU, in chunks small enough to fit into 
the 12k PRU shared RAM, in a "timely manner". With my current design, this 
requires sending 4KiB of data from ARM to PRU shared RAM, completing the 
transaction within 960 uSec of the request for more data. The limiting 
factors are the timing (can't starve the PRU of data, otherwise the 
bitstream out will have gaps which corrupt the content for the (extra-BBB 
client)) and the size of the PRU memory (if I could load a full "frame 
buffer" of data at once to ensure not starving the PRU that would work - 
but the PRU shared RAM only holds 1/8 of the data required by the PRU for 
each burst).

I thought using select() to wait for notification of an event (by 
"listening" to the fsys uio files) would free the ARM cpu to do other 
things while waiting, but provide the most immediate path to the user space 
application to send more data. Is there a better way?

On Wednesday, March 22, 2017 at 9:43:24 AM UTC-7, William Hermans wrote:
>
>
>
> On Wed, Mar 22, 2017 at 8:45 AM, Charles Steinkuehler <
> [email protected] <javascript:>> wrote:
>
>>
>> Note you might need an -rt or -xenomai kernel to achieve reliable
>> operation, I've seen the non-rt kernels occasionally "wander off into
>> the weeds" for several hundred mS at a time.
>>
>> --
>> Charles Steinkuehler
>> [email protected] <javascript:>
>>
>>
> "Wander off into the weeds . . ." I get a kick out of that expression 
> every time I see it in this context.
>
> I do agree with Charles, and would like to add that you need to pay close 
> attention to which C, and Linux API function calls you use in your 
> application. Function calls such as printf() which can be handy for quick 
> and dirty text debugging can slow your code down considerably. However, if 
> you pipe the output of such an application into a file. You'll notice a 
> huge performance improvement with that single trick alone. Anything related 
> to threading, file locks( poll(), etc ), etc through Linux API calls is 
> also going to slow you down. Certainly there is more, but these are the 
> three things I've personally tested, and can think of off the top of my 
> head. Also, under certain conditions, using usleep() in your code where 
> you're using a busy wait loop, can help some, but at other times it could 
> potentially backfire. Depending on how busy your system is. Either way 
> though, a busy wait loop without using usleep(), or giving CPU time back to 
> the system will wind up using ~95% processor time. Until "preempted". Just 
> remember that there is only have one core / thread to work with.
>
> You may also need to slim down unneeded processes, services, and kernel 
> modules that are loaded / running by default on a stock beaglebone Linux 
> image. As all of this will be compete for CPU time, which you may not be 
> able to afford, in order to have your application perform as well as you'd 
> like. Basically you need to profile your system, and see what you can get 
> away with.
>
> So from personal experience, I can say with reasonable confidence that the 
> maximum possible latency with an RT kernel is going to be around 50ms. But 
> this number will be if your system is constantly "busy". If you system is 
> extremely busy, it can be more. But I've had an application that was doing 
> a lot of processing in code, but was only using up to 5% processor time. 
> Because I was giving processor time back to the system by using usleep(). 
> But anyway, if you need "real-time" an RT kernel could work fine. Depending 
> on your definition of the term. If you need deterministic, you may need to 
> use xenomai, move into the kernel, or potentially both.
>
> I would probably start by profiling your system to see what all is running 
> in the background, and if everything you do have running is necessary. 
> After that, try installing an RT kernel.
>

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beagleboard/d26318e6-ab8b-4e1a-886c-38c530589fc1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [beagleboard] How to reliably push data from ARM host to PRU (shared) memory with predictable (low) latency?

Reply via email to