Hi Suraj
> Jason, I've found that multipliers and adders don't get merged into the > same DSP48 slice, even when the latencies are set properly (3 cycles > multiply, 1 cycle add). Have you seen this merge occur in a design? > > This merge only happens under certain conditions. It is one of the reasons why the low-level twiddle block in the FFT is redrawn differently for the Virtex-5 when compared to the Virtex-2. This may have changed slightly in later versions of ISE but in the 10 series I observed the following; * both Adder and Multiplier needed to be implemented as behavioral HDL * both Adder and Multiplier had to have Full output precision * no logic should occur between the two, this included Slice/Concat blocks. I changed the pfb_fir_real as there were Concat and Slice blocks between multipliers and adders which caused this optimisation not to occur. * the latency between the Multiplier and Adder had to be set so that all of it could be absorbed in the pipeline registers within the DSP48E. Normally 2/3 for Multiplier and 1 for Adder worked fine. These all need to be set correctly before a merge occurs. Try building a simple two-adders-into-a-multiplier design and compile using System Generator. Then look at the results (click the Show Details button on the System Generator icon) to see if merging occurs correctly. As I said, ISE may have got better in the 11 series (and in later 10 versions) at recognising where this merge can occur. 'This is probably overkill. The DSP48 adder is not pipelined, but it does use registers to register the inputs and outputs. This means a maximum of 2-3 are useful for timing. The rest of the delay will be implemented in slices, which will likely result in performance loss and wasted resources.' To add to what Suraj has said, the DSP48E, when used as an adder, can absorb one register on the input, and one on the output. It may be useful to place one register on the output of logic before this add, and add a latency of 1 to the add so that a register can be placed just before logic after the add. This should reduce the amount of routing delay to and from the DSP48E. A useful amount of delay for the add is thus 3 with an additional register stage added just before the add. Regards Andrew

