Hehe, a beast indeed :)

I downloaded the StarterWare software and I like it. I'll summarize my 
current understanding, and if someone wants to correct me in case its 
necessary, I'll be glad:
As far as I understand, StarterWare does not use an OS overhead, so you get 
to execute your code directly in the MPU - bare metal access so to say. I 
imagine the same can be accomplished by properly embedding your compiled 
file into a bootloader at the right place. The provided examples are 
reasonably clear, for example to set a GPIO pin, you find the instruction

        GPIOPinWrite(GPIO_INSTANCE_ADDRESS,
                     GPIO_INSTANCE_PIN_NUMBER,
                     GPIO_PIN_LOW);

Checking what is behind is really a simple instruction
        HWREG(baseAdd + GPIO_CLEARDATAOUT) = (1 << pinNumber);
where the macro HWREG only provides a properly shaped pointer to the 
address in brackets. Using these examples is equivalent to a painful and 
time-consuming study of the TRM, where you can find the addresses of all 
those registers. 

So as far as I understand, this operation should also only take the 
equivalent of one single assembler instruction after compilation. Two 
questions now remain:
1) how many cycles does it take the MPU to execute this instruction (or any 
other one - this is not specified in the TRM but I am sure it is somewhere 
in the ARM documention)
2) how long does it take until the value arrives at the output pin

The second question aims at Charles concern. Again, from the TRM i deduce 
that for example the GPIO modules are connected to the MPU through the L4 
interconnect. The interface clock rate is specified in the GPIO chapter of 
the TRM to be 100 MHz. Now I do not understand how this bus works in 
detail, but the fact that it can handle several sources and destinations 
simultaneously raises the concern that there is a buffer involved that 
comes with some extra latency. But I would assume that by running only the 
one code snippet that I define, and no OS processes in the background, that 
all other devices are disabled, and therefore the bus is really only used 
when my program does so. So there should be top prority handling for my 
packets and therefore they should arrive with minimal, and up to clock 
missynchronization, deterministic delay. So I would just estimate a delay 
of a few interface clock cycles, so a latency less than - say - 1/20MHz. Is 
my reasoning correct here or do I forget something?

I guess my next steps will be reading on the MPU itself, as to whether one 
can hope to implement really fast algorithms on a very low level here. If 
Im not mistaken, this 
<https://web.eecs.umich.edu/~prabal/teaching/eecs373-f10/readings/ARMv7-M_ARM.pdf>
 
is the document to read. A first glance tells me that maybe I'll understand 
what the A in Cortex-A9 actually means on a low level :)

If there is a catch that I am not aware of - thanks for letting me know! 

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beagleboard+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to