Re: [beagleboard] Re: Why use PRU shared DRAM (12k) over individual core DRAM (8k)?

ags Thu, 16 Feb 2017 16:43:43 -0800

I have a project down the road that will require fast writes from PRU to 
ARM/system DRAM. But I'm not there yet.

For this project, my focus is on reading data (from SD card, eMMC, USB 
stick, network, etc) into DDR and then pushing it to the PRUs and then 
bit-bang out precise timing (using EGP). I am trying to avoid external 
circuit support and thus need deterministic timing. That's what got me very 
interested in the BBB. Perhaps others as well - what a great, low-cost, 
small-footprint combination of the scope/breadth/content/flexibility of 
Linux with these embedded real-time units.

Eventually it dawned on me that there will be some 
latency/non-deterministic timing unless I use the PRUs completely 
fenced-off from the system (ARM, DDR, etc). So I'm trying to identify 
when/where that non-determinism can occur (and conversely, where it cannot).

When I referenced "shared DRAM" I was sloppy, thinking it was clear in the 
context. I mean the 12k shared DRAM that is part of the PRU-ICSS. I see 
that, or the (2) individual 8k DRAMs as the "portal" to the ARM core (along 
with interrupts). I haven't coded it yet, but I think I'm pretty clear on 
pushing the data from userland to the PRUs (mmap() & /dev/mem as was 
offered above). I already have a use planned for the three scratchpad areas 
and using the broadside interface for single-instruction transfers. They 
appear to not be subject to any conflict other than the other PRU.

The point I'm trying to make is that from the TRM, it appears there is the 
possibility of some non-deterministic latency whenever using anything 
connected to the 32-bit PRU-ICSS bus. That is because the system (ARM) can 
access that bus through the OCP slave - and it will have to do that if it's 
going to be pushing data to the 12k or 8k PRU-ICSS DRAM. I think I can 
manage that (using interrupts to trigger the ARM to write the data and not 
start any timing critical steps until I can determine that write is 
complete). But when thinking this through, the question it has raised is 
this:

If I have both PRUs executing, won't they be (potentially) competing for 
access to the single 32-bit PRU-ICSS bus each time they access their "own" 
8k DRAM or the "shared" 12k DRAM? Both PRUs can access all three of these 
memory locations, and the diagram seems to indicate there is only one path 
to them. And if this is true, then other than 12k being bigger than 8k, I 
don't see any advantage (or difference at all, other than having the same 
address in memory for either of the PRUs) between using the 12k or 8k DRAM 
from either PRU.

That's what I'm trying to verify, or be disabused of whatever mistake I've 
made.

To be specific, this is what I think will (can) happen:

ARM writing to 12k PRU shared DRAM can affect timing of PRU read/write to 
it's own 8k DRAM, the other PRU's 8k DRAM, as well as the 12k PRU-ICSS 12k 
shared DRAM
PRU0 reading/writing to either 8k DRAM or 12k DRAM can affect timing of 
PRU1 reading/writing to either 8K DRAM or 12k DRAM, even if the 
source/target of PRU0 is not the same as the source/target of PRU1
Any reads from system resources (through OCP master) are subject to stalls 
(e.g. peripherals, GPIO, ARM DDR)
Any writes to system resources (through OCP master) are also subject to 
stalls (but less likely) if the interconnect fabric has been saturated. (I 
was hoping I could get some rough idea of how much it takes to "saturate 
the interconnect fabric" - and do only writes contribute, or reads as well).

I will look at that BeagleLogic code and see if I can see how that was 
done. I'd still like to understand the underlying operation in more detail. 
Thanks.

On Friday, February 10, 2017 at 8:07:15 AM UTC-8, Charles Steinkuehler 
wrote:
>
> On 2/9/2017 8:42 PM, William Hermans wrote: 
> > 
> > But the point is really this. If you need to get data out of the PRU's 
> into 
> > userland Linux as quickly as possible. Maybe the way to pull that data 
> ot of the 
> > PRU's memory is from the ARM(Linux ) side of things ? 
>
> No, you want to have the PRU doing writes. 
>
> In modern systems, writes are fast (they can get posted so they 
> complete at the initiator side and can take their time working through 
> the various interconnect fabrics to make their way to their ultimate 
> destination).  Reads typically stall the initiator until the data is 
> received. 
>
> If you need to move data quickly from the PRU to the ARM, reference 
> the BeagleLogic code.  That moves data pretty much as quickly as the 
> hardware physically allows (which requires a kernel module): 
>
> https://github.com/abhishek-kakkar/BeagleLogic 
>
> -- 
> Charles Steinkuehler 
> [email protected] <javascript:> 
>

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beagleboard/f429ab44-1ca3-4621-8027-095e8fb80330%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [beagleboard] Re: Why use PRU shared DRAM (12k) over individual core DRAM (8k)?

Reply via email to