* The FFT_wideband_real compiles at 250MHz - I used 6 add latency
(might be overkill), 3 mult, 3 BRAM, 3 convert, use less logic,
truncate, wrap.
This is probably overkill. The DSP48 adder is not pipelined, but it
does use registers to register the inputs and outputs. This means a
maximum of 2-3 are useful for timing. The rest of the delay will be
implemented in slices, which will likely result in performance loss
and wasted resources.
Jason, I've found that multipliers and adders don't get merged into
the same DSP48 slice, even when the latencies are set properly (3
cycles multiply, 1 cycle add). Have you seen this merge occur in a
design?
-Suraj
On Sep 10, 2010, at 4:01 AM, Danny Price wrote:
Thanks everyone, I've managed to get a modified tutorial 3 to
compile (with Jack and Dave's help), will check its performance this
afternoon. For the mailing list record, here's some notes for
getting things running at 250MHz:
* The pfb_fir_real caused timing issues due to its adders and its
convert. Changing the adder latency to 4, then breaking the mask and
init script, I manually went into the adder blocks, changed the
implentation from "Use behavioual HDL" to "Pipeline for maximum
performance" and "implement using DSP48". Similarly, with the
convert blocks in the pfb, make sure "pipeline for maximum
performance" is selected. I've also got these on truncate, instead
of round.
* The FFT_wideband_real compiles at 250MHz - I used 6 add latency
(might be overkill), 3 mult, 3 BRAM, 3 convert, use less logic,
truncate, wrap.
* The round block (under the quant0 block), pipelined the converts
as above
* The adder in the vector accumulators vacc0 and vacc1
* The counter in acc_cntrl I changed to implement using DSP48
* The counter in the pulse extenders also changed to DSP48
In addition to this, I added some of the pipeline delays in a few
places.
The moral of the story is that DSP48s and pipelining is the key to
meeting timing. If you increase delay somewhere make sure you don't
forget to match it in other areas so everything is in sync. The
timing report gives you a good clue as to what's failing, and once
you know where to look it's not too hard to fix. Of course, you'll
be using a lot more DSP48s, so large designs will likely run out.
Hope that helps!
Cheers
Danny
On 07/09/2010 18:08, David MacMahon wrote:
On Sep 3, 2010, at 8:20 , Jack Hickish wrote:
After a compile fails, it's worth checking the timing report in
the compile directory ..../XPS_ROACH_BASE/implementation/system.twr
Whilst a little bit cryptic, the report should at least give you
some idea of which bits of the design are causing timing failure.
It becomes reasonably clear if it's adders in the FIR, or casts in
the FFT for example.
I concur! Blindly adding additional latency will certainly change
things and maybe result in a deign that can meet timing, but it can
(and often does) also result in unnecessary additional resource
utilization. Having a better understanding of where timing is
failing will lead to a more targeted solution.
I think it would be useful to track where timing problems occur
across different designs. If this points to a particular library
block as problematic across multiple designs, it would serve as
justification for additional work on that block.
Dave