I have a project down the road that will require fast writes from PRU to ARM/system DRAM. But I'm not there yet.
For this project, my focus is on reading data (from SD card, eMMC, USB stick, network, etc) into DDR and then pushing it to the PRUs and then bit-bang out precise timing (using EGP). I am trying to avoid external circuit support and thus need deterministic timing. That's what got me very interested in the BBB. Perhaps others as well - what a great, low-cost, small-footprint combination of the scope/breadth/content/flexibility of Linux with these embedded real-time units. Eventually it dawned on me that there will be some latency/non-deterministic timing unless I use the PRUs completely fenced-off from the system (ARM, DDR, etc). So I'm trying to identify when/where that non-determinism can occur (and conversely, where it cannot). When I referenced "shared DRAM" I was sloppy, thinking it was clear in the context. I mean the 12k shared DRAM that is part of the PRU-ICSS. I see that, or the (2) individual 8k DRAMs as the "portal" to the ARM core (along with interrupts). I haven't coded it yet, but I think I'm pretty clear on pushing the data from userland to the PRUs (mmap() & /dev/mem as was offered above). I already have a use planned for the three scratchpad areas and using the broadside interface for single-instruction transfers. They appear to not be subject to any conflict other than the other PRU. The point I'm trying to make is that from the TRM, it appears there is the possibility of some non-deterministic latency whenever using anything connected to the 32-bit PRU-ICSS bus. That is because the system (ARM) can access that bus through the OCP slave - and it will have to do that if it's going to be pushing data to the 12k or 8k PRU-ICSS DRAM. I think I can manage that (using interrupts to trigger the ARM to write the data and not start any timing critical steps until I can determine that write is complete). But when thinking this through, the question it has raised is this: If I have both PRUs executing, won't they be (potentially) competing for access to the single 32-bit PRU-ICSS bus each time they access their "own" 8k DRAM or the "shared" 12k DRAM? Both PRUs can access all three of these memory locations, and the diagram seems to indicate there is only one path to them. And if this is true, then other than 12k being bigger than 8k, I don't see any advantage (or difference at all, other than having the same address in memory for either of the PRUs) between using the 12k or 8k DRAM from either PRU. That's what I'm trying to verify, or be disabused of whatever mistake I've made. To be specific, this is what I think will (can) happen: ARM writing to 12k PRU shared DRAM can affect timing of PRU read/write to it's own 8k DRAM, the other PRU's 8k DRAM, as well as the 12k PRU-ICSS 12k shared DRAM PRU0 reading/writing to either 8k DRAM or 12k DRAM can affect timing of PRU1 reading/writing to either 8K DRAM or 12k DRAM, even if the source/target of PRU0 is not the same as the source/target of PRU1 Any reads from system resources (through OCP master) are subject to stalls (e.g. peripherals, GPIO, ARM DDR) Any writes to system resources (through OCP master) are also subject to stalls (but less likely) if the interconnect fabric has been saturated. (I was hoping I could get some rough idea of how much it takes to "saturate the interconnect fabric" - and do only writes contribute, or reads as well). I will look at that BeagleLogic code and see if I can see how that was done. I'd still like to understand the underlying operation in more detail. Thanks. On Friday, February 10, 2017 at 8:07:15 AM UTC-8, Charles Steinkuehler wrote: > > On 2/9/2017 8:42 PM, William Hermans wrote: > > > > But the point is really this. If you need to get data out of the PRU's > into > > userland Linux as quickly as possible. Maybe the way to pull that data > ot of the > > PRU's memory is from the ARM(Linux ) side of things ? > > No, you want to have the PRU doing writes. > > In modern systems, writes are fast (they can get posted so they > complete at the initiator side and can take their time working through > the various interconnect fabrics to make their way to their ultimate > destination). Reads typically stall the initiator until the data is > received. > > If you need to move data quickly from the PRU to the ARM, reference > the BeagleLogic code. That moves data pretty much as quickly as the > hardware physically allows (which requires a kernel module): > > https://github.com/abhishek-kakkar/BeagleLogic > > -- > Charles Steinkuehler > [email protected] <javascript:> > -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups "BeagleBoard" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/f429ab44-1ca3-4621-8027-095e8fb80330%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
