Re: [beagleboard] Equivalent of PRU on main CPU

Charles Steinkuehler Wed, 05 Aug 2015 15:07:29 -0700

On 8/5/2015 4:49 PM, Lenny wrote:
> Hehe, a beast indeed :)
> 
> I downloaded the StarterWare software and I like it. I'll summarize my 
> current understanding, and if someone wants to correct me in case its 
> necessary, I'll be glad:
> As far as I understand, StarterWare does not use an OS overhead, so you get 
> to execute your code directly in the MPU - bare metal access so to say. I 
> imagine the same can be accomplished by properly embedding your compiled 
> file into a bootloader at the right place. The provided examples are 
> reasonably clear, for example to set a GPIO pin, you find the instruction
> 
>         GPIOPinWrite(GPIO_INSTANCE_ADDRESS,
>                      GPIO_INSTANCE_PIN_NUMBER,
>                      GPIO_PIN_LOW);
> 
> Checking what is behind is really a simple instruction
>         HWREG(baseAdd + GPIO_CLEARDATAOUT) = (1 << pinNumber);
> where the macro HWREG only provides a properly shaped pointer to the 
> address in brackets. Using these examples is equivalent to a painful and 
> time-consuming study of the TRM, where you can find the addresses of all 
> those registers. 
> 
> So as far as I understand, this operation should also only take the 
> equivalent of one single assembler instruction after compilation. Two 
> questions now remain:
> 1) how many cycles does it take the MPU to execute this instruction (or any 
> other one - this is not specified in the TRM but I am sure it is somewhere 
> in the ARM documention)
> 2) how long does it take until the value arrives at the output pin
> 
> The second question aims at Charles concern. Again, from the TRM i deduce 
> that for example the GPIO modules are connected to the MPU through the L4 
> interconnect. The interface clock rate is specified in the GPIO chapter of 
> the TRM to be 100 MHz. Now I do not understand how this bus works in 
> detail, but the fact that it can handle several sources and destinations 
> simultaneously raises the concern that there is a buffer involved that 
> comes with some extra latency. But I would assume that by running only the 
> one code snippet that I define, and no OS processes in the background, that 
> all other devices are disabled, and therefore the bus is really only used 
> when my program does so. So there should be top prority handling for my 
> packets and therefore they should arrive with minimal, and up to clock 
> missynchronization, deterministic delay. So I would just estimate a delay 
> of a few interface clock cycles, so a latency less than - say - 1/20MHz. Is 
> my reasoning correct here or do I forget something?


See my previous mail, the numbers for the PRU will likely closely
match what you can do on the ARM core, since the interconnect is going
to be the limiting factor to performance.

tl;dr:
Writes will go fast, but won't show up at the pin for a while.
Reads will take about 165 nS.

> I guess my next steps will be reading on the MPU itself, as to whether one 
> can hope to implement really fast algorithms on a very low level here. If 
> Im not mistaken, this 
> <https://web.eecs.umich.edu/~prabal/teaching/eecs373-f10/readings/ARMv7-M_ARM.pdf>
>  
> is the document to read. A first glance tells me that maybe I'll understand 
> what the A in Cortex-A9 actually means on a low level :)

Um...that's the -M manual, you want the -A manual, specifically the
Cortex-A9.  Just get it straight from the source:

http://www.arm.com/products/processors/cortex-a/cortex-a9.php

...click the "resources" tab.

-- 
Charles Steinkuehler
[email protected]

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [beagleboard] Equivalent of PRU on main CPU

Reply via email to