On Fri, Mar 10, 2017 at 2:53 PM, ags <[email protected]> wrote:
> I've had a hard time getting any definitive responses to questions on the > subject of memory access & latency. It is true that the PRU cores have > faster access to DRAM that is part of the PRU-ICSS (through the 32-bit > interconnect SCR) - though not single-cycle - than to system DDR. However, > the ARM core accesses DDR through L3 fabric, but the PRU-ICSS through > L4FAST, so I'm thinking that it can access DDR faster than PRU-ICSS memory. > > I've also asked about differences in latency/throughput/contention > comparing PRU-ICSS 12KB shared RAM v the 8KB data RAM. No response. Since > both 8K data RAM is accessible to both PRU cores, I'm not sure what the > benefit of the 12KB shared RAM is (thought I imagine there is, I just can't > figure it out). > > Lastly - and even more importantly - is total agreement that you have to > be careful about accessing any memory correctly. I have posted several > times asking about the am335x_pru_package examples (using UIO). In at least > one (https://github.com/beagleboard/am335x_pru_package/blob/master/pru_sw/ > example_apps/PRU_PRUtoPRU_Interrupt/PRU_PRUtoPRU_Interrupt.c), there is > hardcoded use of the first 8 bytes of physical memory at 0x8000_0000. I > don't see how that can be OK. It may be that I don't know some secrets of > Linux internals, but from a theoretical perspective, I just don't know how > one can make the assumption that any part of main memory is not in use by > another process unless it is guaranteed by the kernel. > > So here is what I meant. Of course, I have no personal hands on,but looking at things from 35k feet. I *know* writing directly to the PRU shared memory from userspace, would be, performance wise, just as fast as writing to the 512M of system DDR. Through /dev/mem/. On the PRU side however, the PRU's would have single cycle access to their own memory. So the tricky part for me here would not be making sure we're writing to the right memory location, but knowing it's possible to begin with because I have not attempted this personally. In fact my hands on experience with the PRU is limited to just setting up a couple examples, and proving to myself it would work with a 4.x kernel. So my only real "concern" is, if it really is possible to mmap() the physical address for the PRU's shared memory, and if that could be done "safely". But I do know that if it is possible, it would be faster than reading and writing to the systems 512M DDR because of the fabric latency. >From the PRU side. Not only that, from what I've read in the past, is that accessing devices, or memory through that fabric can add a little bit of non deterministic latency. So my thinking here is that "we'd" gain back our little bit of determinism that we lost using DDR. After that, I have no idea how important what I'm talking about is to you, with your given project. Address 0x8000000h though, I seem to recall is possibly related to the kernel, or perhaps the initrd. But another thing, that I do not pretend to know 100% about is how Linux virtual memory works. So when we say we're accessing "physical memory", through mmap() we're actually accessing the device modules, or external memory through virtual memory. Which it could very well be possible the person who wrote the uio pru examples knew this going in, and it's not by accident at all. But rather by design. I'd have to look further into the gory details of everything, before I could make this determination. -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups "BeagleBoard" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/CALHSORrixYD7i697VM9Ksx3Kgz7Kp5umV5o%3DoKrGHxEzZ63Epg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
