Hehe, a beast indeed :)
I downloaded the StarterWare software and I like it. I'll summarize my
current understanding, and if someone wants to correct me in case its
necessary, I'll be glad:
As far as I understand, StarterWare does not use an OS overhead, so you get
to execute your code directly in the MPU - bare metal access so to say. I
imagine the same can be accomplished by properly embedding your compiled
file into a bootloader at the right place. The provided examples are
reasonably clear, for example to set a GPIO pin, you find the instruction
GPIOPinWrite(GPIO_INSTANCE_ADDRESS,
GPIO_INSTANCE_PIN_NUMBER,
GPIO_PIN_LOW);
Checking what is behind is really a simple instruction
HWREG(baseAdd + GPIO_CLEARDATAOUT) = (1 << pinNumber);
where the macro HWREG only provides a properly shaped pointer to the
address in brackets. Using these examples is equivalent to a painful and
time-consuming study of the TRM, where you can find the addresses of all
those registers.
So as far as I understand, this operation should also only take the
equivalent of one single assembler instruction after compilation. Two
questions now remain:
1) how many cycles does it take the MPU to execute this instruction (or any
other one - this is not specified in the TRM but I am sure it is somewhere
in the ARM documention)
2) how long does it take until the value arrives at the output pin
The second question aims at Charles concern. Again, from the TRM i deduce
that for example the GPIO modules are connected to the MPU through the L4
interconnect. The interface clock rate is specified in the GPIO chapter of
the TRM to be 100 MHz. Now I do not understand how this bus works in
detail, but the fact that it can handle several sources and destinations
simultaneously raises the concern that there is a buffer involved that
comes with some extra latency. But I would assume that by running only the
one code snippet that I define, and no OS processes in the background, that
all other devices are disabled, and therefore the bus is really only used
when my program does so. So there should be top prority handling for my
packets and therefore they should arrive with minimal, and up to clock
missynchronization, deterministic delay. So I would just estimate a delay
of a few interface clock cycles, so a latency less than - say - 1/20MHz. Is
my reasoning correct here or do I forget something?
I guess my next steps will be reading on the MPU itself, as to whether one
can hope to implement really fast algorithms on a very low level here. If
Im not mistaken, this
<https://web.eecs.umich.edu/~prabal/teaching/eecs373-f10/readings/ARMv7-M_ARM.pdf>
is the document to read. A first glance tells me that maybe I'll understand
what the A in Cortex-A9 actually means on a low level :)
If there is a catch that I am not aware of - thanks for letting me know!
--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to the Google Groups
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.