Hi Dave,

 

thanks for this info.

>> One thing you could try is to add even more delay stages so that you have

>> more leftover after some get combined into the macrocells.

That’s exactly what I did. If I understand correctly then the maximum pipeline 
depth

of a DSP48 is four, above that all registers are forced into the fabric and can 
be used

to bridge wiring delays. When cascading is used this might be different, though.

I guess normally one would do this by going to the Function Block Parameters 
and setting

the latencies accordingly. However, I have made so many manual ad-hoc changes 
to the

Simulink model that this is not an option any more.

I’m sure this is not the best way in general to apply optimizations to the 
design. However, to

the average engineer, the Simulink model is all that is exposed. It appears I’m 
caught in an

endless loop between Simulink, Casper_xps, PlanAhead and back to Simulink.

 

Cheers

Guenter

 

 

From: David MacMahon [mailto:dav...@berkeley.edu] 
Sent: Donnerstag, 22. September 2016 07:46
To: Guenter Knittel
Cc: casper list
Subject: Re: [casper] FFT speed optimizations

 

Hi, Guenter,

 

I can’t help specifically with specifying slow (multi-cycle) signals in 
stimulink, but I can comment on this:

 

On Sep 21, 2016, at 04:46, Guenter Knittel <gknit...@mpifr-bonn.mpg.de> wrote:

 

- The second problem can also be found often and occurs for example in

fft_wideband_real/fft_direct/butterfly3_x/twiddle between coeff_gen and

bus_mult. From the MUX to the (complex) multiplier I count 4 pipeline stages

of delay, but in the device diagram in PlanAhead I can only see two. On the 
other

hand, the multipliers appear to use more registers than is indicated in the 
Simulink

diagram. Can it be that the tools have moved some delay stages into the 
macrocell

to save resources, but defeating their purpose? If so, can I do something about 
it?

 

Yes, it is often the case that the tools will move some register stages into 
the macrocells.  This is not done so much to save resources but rather to 
utilize the flip-flops built into the macrocells (e.g. DSP48 blocks and BRAMs). 
 Using these embedded registers usually results in faster operation of the 
macrocells.  You might feel a sense of victory if you manage to prevent the 
tools from "absorbing" these registers into the macrocells, but that victorious 
feeling would probably be short lived because the timing problems would 
probably get worse.  One thing you could try is to add even more delay stages 
so that you have more leftover after some get combined into the macrocells.

 

HTH,

Dave

 

Reply via email to