Hi, Andrew and Griffin,
Looks great overall! Here are a few suggestions (you asked for
them!)...
Clock selection and frequency specification have been a perennial
source of confusion on the ibobs. If the ADC is sampling at 800 MHz
and feeding the ibob at 200 MHz, do you really want to use sys_clk at
100 MHz for the user ip? Will the toolflow even use the "XSG code
config" block's clock settings if the ADC blocks are present?
I've always thought of "8.7" notation as meaning "8 integer bits and
7 fractional bits" and "8_7" as meaning "8 bits, 7 of which are
fractional" (IOW, "8.7" == "15_7" and "1.7" == "8_7"). You define
and use 8.7 differently, which is OK, but still different from the
"8_7" format used by simulink.
The range of an "8_7" signal is -128/128 (i.e. -1) to +127/128 (i.e.
1-1/128).
The multipliers are setup to do a squaring operation. The square of
a Fix_8_7 input will range from 0 to +1 in steps of 2^-14. This
range can be represented in a UFix_15_14 format, which actually
covers 0 to 32767/16384, almost twice the range required. If
16383/16384 is an acceptable approximation of +1 (I would argue that
it would be for this application), then the multiplier output width
could be reduced by another bit to UFix_14_14 (and be sure to use the
"saturation" option).
The adder and mux arrangement is already available prepackaged as the
Xilinx "Accumulator" block (use its "reset to 'b'" option). Saves
layout complexity and is implemented as a Xilinx IP core so likely to
be more efficient as well.
Might be better to put the latency of 1 in the adder block than to
use a latency of 0 in the adder followed by a delay. The mapper
should push the delay into the flip-flop of the adder's output
slices, but the timing on the adder might be overly tight if it has
zero latency. At the very least, using a latency of one in the adder
will keep the model cleaner. But this will be replaced with the
Accumulator block anyway.
If only 29 bits are needed in the accumulator (I think it's only 27),
don't expand to 32 bits until afterwards. The three (or five) extra
bits in the accumulator will use extra resources, but adding them
afterwards is free.
Dave