Hi Dave,
thanks for this info. >> One thing you could try is to add even more delay stages so that you have >> more leftover after some get combined into the macrocells. That’s exactly what I did. If I understand correctly then the maximum pipeline depth of a DSP48 is four, above that all registers are forced into the fabric and can be used to bridge wiring delays. When cascading is used this might be different, though. I guess normally one would do this by going to the Function Block Parameters and setting the latencies accordingly. However, I have made so many manual ad-hoc changes to the Simulink model that this is not an option any more. I’m sure this is not the best way in general to apply optimizations to the design. However, to the average engineer, the Simulink model is all that is exposed. It appears I’m caught in an endless loop between Simulink, Casper_xps, PlanAhead and back to Simulink. Cheers Guenter From: David MacMahon [mailto:dav...@berkeley.edu] Sent: Donnerstag, 22. September 2016 07:46 To: Guenter Knittel Cc: casper list Subject: Re: [casper] FFT speed optimizations Hi, Guenter, I can’t help specifically with specifying slow (multi-cycle) signals in stimulink, but I can comment on this: On Sep 21, 2016, at 04:46, Guenter Knittel <gknit...@mpifr-bonn.mpg.de> wrote: - The second problem can also be found often and occurs for example in fft_wideband_real/fft_direct/butterfly3_x/twiddle between coeff_gen and bus_mult. From the MUX to the (complex) multiplier I count 4 pipeline stages of delay, but in the device diagram in PlanAhead I can only see two. On the other hand, the multipliers appear to use more registers than is indicated in the Simulink diagram. Can it be that the tools have moved some delay stages into the macrocell to save resources, but defeating their purpose? If so, can I do something about it? Yes, it is often the case that the tools will move some register stages into the macrocells. This is not done so much to save resources but rather to utilize the flip-flops built into the macrocells (e.g. DSP48 blocks and BRAMs). Using these embedded registers usually results in faster operation of the macrocells. You might feel a sense of victory if you manage to prevent the tools from "absorbing" these registers into the macrocells, but that victorious feeling would probably be short lived because the timing problems would probably get worse. One thing you could try is to add even more delay stages so that you have more leftover after some get combined into the macrocells. HTH, Dave