Hi Jonathan

To start we are looking closely at the FPGA resource utilization of large
> PFBs.  Something that probably is common knowledge amongst those experienced
> in FX correlator design is that the demux factor drives the utilization much
> faster than the size of the PFB.  In that sense bandwidth is far more
> expensive than spectral resolution.  We've put some effort into accurately
> quantifying the utilization, at least as far as multipliers and adders are
> concerned, and are expanding this analysis to block ram and other resources.
>  And demux factor is typically radix 2, so it is very much quantized.
>

Some thoughts on resource usage with the CASPER pfb_fir (for large FFT
sizes, i.e not doing straight butterfly);

complex multiplier usage;
  - scales linearly with the demux factor (often bandwidth)
  - scales linearly with number of FIR taps
  - is not affected by the FFT size

adder usage (the final adder tree);
  - scales by nlogn with the demux factor. Will dominate adder usage for
large demux factors
  - scales by nlogn with the number of FIR taps
  - is not affected by the FFT size

BRAM usage;
  - scales linearly with demux factor but should not be affected (barring
constraints set by underlying hardware). (BRAMs are currently not used
efficiently - a separate set of coefficient and data storage BRAMs is not
needed for each data input. The storage requirements should be completely
dependent on FFT size and number of FIR taps).
  - scales linearly with the number of FIR taps. The current design could be
improved so that BRAMs are more efficiently used though.
  - scales linearly with FFT size.

Routing constraints;
The design is simple, highly pipelined (almost no feedback) with very low
fanout. Major constraints are BRAM to DSP slice, DSP slice to DSP slice and
rounding, all of which are parameterised.

Optimisations possible;
The efficiency of BRAM use can be improved with some small logic savings.

Resource usage in the CASPER FFT (when using the biplex FFT (eg
fft_wideband_real and fft for 'large' FFTs);

complex multiplier usage;
  - dominated by (n/2)*log2n (n = demux factor) needed in fft_direct for
large FFTs.
  - scales linearly with increase in FFT size.

BRAM usage;
  - scales linearly with bandwidth for large FFTs if FFT size kept constant.
  - for constant (large) FFT size, unaffected by demux factor. Biplex cores
shrink in length by one stage while doubling in number for each doubling in
demux factor.
  - scales roughly like n^2 with increase in FFT size.

Routing constraints;
  The FFT is highly pipelined with low fanout except for in the unscrambler
(although some work has been done here and the unscrambler is now optional).
Major constraints are BRAM to DSP slice, DSP slice to DSP slice and
rounding, all of which are parameterised.

Optimisations possible;
Various optimisations are still possible;
  - Coefficients could be shared between twiddles, reducing the number of
BRAMs required by the demux factor. This would be significant for large
demux factor designs at the expense of some fanout.
  - BRAMs used for delaying data could be shared between input streams,
saving some BRAMs at the expense of extra routing constraints.
  - As Dan has suggested, grow the bits in the FFT at each stage as needed
to reduce logic (and BRAM) use and probably help timing. Care should be
taken however, as data quality is directly related to the width of the data
path through the FFT.

As noted by Jason, please also remember that other constraints such as QDR
SRAM and XAUI bandwidth needs to be considered when building such a large
system.

Dan's suggestion of FFX is worth considering. It is upgradeable, allowing
the addition of newer, more capable boards as they come online until you end
up with a simple FX correlator again.

I would love to see a correlator like that in action.

Regards
Andrew

Reply via email to