Hehe, a beast indeed :) I downloaded the StarterWare software and I like it. I'll summarize my current understanding, and if someone wants to correct me in case its necessary, I'll be glad: As far as I understand, StarterWare does not use an OS overhead, so you get to execute your code directly in the MPU - bare metal access so to say. I imagine the same can be accomplished by properly embedding your compiled file into a bootloader at the right place. The provided examples are reasonably clear, for example to set a GPIO pin, you find the instruction
GPIOPinWrite(GPIO_INSTANCE_ADDRESS, GPIO_INSTANCE_PIN_NUMBER, GPIO_PIN_LOW); Checking what is behind is really a simple instruction HWREG(baseAdd + GPIO_CLEARDATAOUT) = (1 << pinNumber); where the macro HWREG only provides a properly shaped pointer to the address in brackets. Using these examples is equivalent to a painful and time-consuming study of the TRM, where you can find the addresses of all those registers. So as far as I understand, this operation should also only take the equivalent of one single assembler instruction after compilation. Two questions now remain: 1) how many cycles does it take the MPU to execute this instruction (or any other one - this is not specified in the TRM but I am sure it is somewhere in the ARM documention) 2) how long does it take until the value arrives at the output pin The second question aims at Charles concern. Again, from the TRM i deduce that for example the GPIO modules are connected to the MPU through the L4 interconnect. The interface clock rate is specified in the GPIO chapter of the TRM to be 100 MHz. Now I do not understand how this bus works in detail, but the fact that it can handle several sources and destinations simultaneously raises the concern that there is a buffer involved that comes with some extra latency. But I would assume that by running only the one code snippet that I define, and no OS processes in the background, that all other devices are disabled, and therefore the bus is really only used when my program does so. So there should be top prority handling for my packets and therefore they should arrive with minimal, and up to clock missynchronization, deterministic delay. So I would just estimate a delay of a few interface clock cycles, so a latency less than - say - 1/20MHz. Is my reasoning correct here or do I forget something? I guess my next steps will be reading on the MPU itself, as to whether one can hope to implement really fast algorithms on a very low level here. If Im not mistaken, this <https://web.eecs.umich.edu/~prabal/teaching/eecs373-f10/readings/ARMv7-M_ARM.pdf> is the document to read. A first glance tells me that maybe I'll understand what the A in Cortex-A9 actually means on a low level :) If there is a catch that I am not aware of - thanks for letting me know! -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups "BeagleBoard" group. To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.