Le vendredi 19 septembre 2014 17:22:47 UTC-4, Charles Steinkuehler a écrit :
>
> On 9/19/2014 3:51 PM, Cedric Malitte wrote: 
> > 
> > Le vendredi 19 septembre 2014 16:46:05 UTC-4, Cedric Malitte a écrit : 
> >> 
> >> Hi all, 
> >> 
> >> I had a few hours to play with the pruss, but I came to a dead end... 
> >> 
> >> My goal is to read ADCs, ADS8326 to be precise. 
> >> It's a kind of SPI adc with one clock, one select, one out. 
> >> 
> >> I'd like to use 4 in parallel, which means only one clock, one select 
> and 
> >> 4 inputs on the PRUSS. 
> >> I try to pull up CLK line and then read each input, shift them into 
> >> variables to be sent to main app. 
> >> 
> >> When I look at the CLK line on a scope, it's taking way too much time 
> to 
> >> get input states and shift even if the asm code should only take a few 
> >> cycles. 
> >> I'm lazy, I write the pruss code in C, but asm looks nice. 
> >> 
> >> Here's the code in C 
>
> <snip> 
>
> >> My great trouble is that it takes to much time, in fact way too much. 
> >> 
> >> Using this code, the CLK line is at 757 Khz. 
> >> CLK hi is around 1us and low is the rest.... 
> >> 
> >> I'd like to achieve at least 2Mhz for CLK line. 
> >> 
> >> I might have misread the doc, but isn't an instruction supposed to be 
> 5ns 
> >> ? 
> >> That should be 35ns for first part and 40ns for second part. 
> >> 
> >> Any clue or help ? 
> >> 
> >> The learning curve is a bit harder than I tought :) 
> >> 
> >> Thanks 
> >> 
> > Well I misread the doc.... not all instructions are created equal :) 
> > 
> > Even that, it's still slow as hell to read the inputs... 
>
> The *INSTRUCTION* takes 5 nS (or maybe 10-15, depending on exactly what 
> you're doing), but since you're reading data from outside the PRU 
> domain, the round-trip time for each GPIO read is killing your 
> performance.  You need to use the direct PRU inputs, and not general 
> purpose I/O accessed through the AXI fabric. 
>
> I have some details on read/write timings to the GPIO via the 
> interconnect fabric in the comments of my PRU code for Machinekit: 
>
>
> https://github.com/machinekit/machinekit/blob/master/src/hal/drivers/hal_pru_generic/pru_generic.p#L135-L163
>  
>
> Note that *WRITES* from the PRU to the GPIO are fairly quick, but 
> *READS* are very slow.  This is because the write can be posted allowing 
> the PRU to continue on executing code, but on reads the PRU stalls until 
> the data is returned. 
>
> Executive Summary of PRU <-> GPIO timing: 
>
> Peak GPIO write speed      :   10 nS (100 MHz) 
> Sustained GPIO write speed :   40 nS ( 25 MHz) 
> GPIO Read speed            : ~165 nS ( ~6 MHz) 
>
> You are then making things much worse by reading from the GPIO bank 
> multiple times in your code.  You should factor all the 
> HWREG(SOC_GPIO_3_REGS + GPIO_DATAIN) accesses into a single read to a 
> local variable, then use the local variable to do the bit manipulations, 
> rather than performing the expensive read four times. 
>
> Also, don't blame the compiler for not optimizing this for you.  If you 
> are wondering why this didn't get optimized, the compiler cannot treat a 
> GPIO register read as a generic (ie: cachable) memory read since the 
> value read can potentially be different each time (ie: the access is 
> volatile).  Therefore, it's up to you to integrate any read or write 
> combining that is acceptable, the compiler can't do it for you.  Also, 
> even standard memory reads from DDR via the PRU are really volatile, 
> since the ARM core is running in the background and could potentially be 
> changing the values between each PRU access. 
>
> Clean up your code a bit and I expect you'll see much better results! 
>
> -- 
> Charles Steinkuehler 
> [email protected] <javascript:> 
>

Thanks a lot Charles for your lights.

Yes my code is not really optimized and that's only a draft to play with 
I/Os.
I do coding for microchips and I thought ( beat me ) that the pruss would 
behave the same concerning IOs, I mean direct access.

When I saw the delays, I also thought to read the whole reg and then 
bitmask it to get the pins I need.
And you confirmed this, it will be a lot faster doing bitmasking.
As far as I can achieve under 250ns for reading, it will be fine.

I had a quick look at your code, and will dig into it later.
I think I'll code directly in ASM as I do not have that much to do.
Just an infinite loop to clock 2 pins, read the others and send the value 
over ram.

Regards,

Cedric


 

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to