Le vendredi 19 septembre 2014 17:22:47 UTC-4, Charles Steinkuehler a écrit : > > On 9/19/2014 3:51 PM, Cedric Malitte wrote: > > > > Le vendredi 19 septembre 2014 16:46:05 UTC-4, Cedric Malitte a écrit : > >> > >> Hi all, > >> > >> I had a few hours to play with the pruss, but I came to a dead end... > >> > >> My goal is to read ADCs, ADS8326 to be precise. > >> It's a kind of SPI adc with one clock, one select, one out. > >> > >> I'd like to use 4 in parallel, which means only one clock, one select > and > >> 4 inputs on the PRUSS. > >> I try to pull up CLK line and then read each input, shift them into > >> variables to be sent to main app. > >> > >> When I look at the CLK line on a scope, it's taking way too much time > to > >> get input states and shift even if the asm code should only take a few > >> cycles. > >> I'm lazy, I write the pruss code in C, but asm looks nice. > >> > >> Here's the code in C > > <snip> > > >> My great trouble is that it takes to much time, in fact way too much. > >> > >> Using this code, the CLK line is at 757 Khz. > >> CLK hi is around 1us and low is the rest.... > >> > >> I'd like to achieve at least 2Mhz for CLK line. > >> > >> I might have misread the doc, but isn't an instruction supposed to be > 5ns > >> ? > >> That should be 35ns for first part and 40ns for second part. > >> > >> Any clue or help ? > >> > >> The learning curve is a bit harder than I tought :) > >> > >> Thanks > >> > > Well I misread the doc.... not all instructions are created equal :) > > > > Even that, it's still slow as hell to read the inputs... > > The *INSTRUCTION* takes 5 nS (or maybe 10-15, depending on exactly what > you're doing), but since you're reading data from outside the PRU > domain, the round-trip time for each GPIO read is killing your > performance. You need to use the direct PRU inputs, and not general > purpose I/O accessed through the AXI fabric. > > I have some details on read/write timings to the GPIO via the > interconnect fabric in the comments of my PRU code for Machinekit: > > > https://github.com/machinekit/machinekit/blob/master/src/hal/drivers/hal_pru_generic/pru_generic.p#L135-L163 > > > Note that *WRITES* from the PRU to the GPIO are fairly quick, but > *READS* are very slow. This is because the write can be posted allowing > the PRU to continue on executing code, but on reads the PRU stalls until > the data is returned. > > Executive Summary of PRU <-> GPIO timing: > > Peak GPIO write speed : 10 nS (100 MHz) > Sustained GPIO write speed : 40 nS ( 25 MHz) > GPIO Read speed : ~165 nS ( ~6 MHz) > > You are then making things much worse by reading from the GPIO bank > multiple times in your code. You should factor all the > HWREG(SOC_GPIO_3_REGS + GPIO_DATAIN) accesses into a single read to a > local variable, then use the local variable to do the bit manipulations, > rather than performing the expensive read four times. > > Also, don't blame the compiler for not optimizing this for you. If you > are wondering why this didn't get optimized, the compiler cannot treat a > GPIO register read as a generic (ie: cachable) memory read since the > value read can potentially be different each time (ie: the access is > volatile). Therefore, it's up to you to integrate any read or write > combining that is acceptable, the compiler can't do it for you. Also, > even standard memory reads from DDR via the PRU are really volatile, > since the ARM core is running in the background and could potentially be > changing the values between each PRU access. > > Clean up your code a bit and I expect you'll see much better results! > > -- > Charles Steinkuehler > [email protected] <javascript:> >
Thanks a lot Charles for your lights. Yes my code is not really optimized and that's only a draft to play with I/Os. I do coding for microchips and I thought ( beat me ) that the pruss would behave the same concerning IOs, I mean direct access. When I saw the delays, I also thought to read the whole reg and then bitmask it to get the pins I need. And you confirmed this, it will be a lot faster doing bitmasking. As far as I can achieve under 250ns for reading, it will be fine. I had a quick look at your code, and will dig into it later. I think I'll code directly in ASM as I do not have that much to do. Just an infinite loop to clock 2 pins, read the others and send the value over ram. Regards, Cedric -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups "BeagleBoard" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
