Hi Jonathan To start we are looking closely at the FPGA resource utilization of large > PFBs. Something that probably is common knowledge amongst those experienced > in FX correlator design is that the demux factor drives the utilization much > faster than the size of the PFB. In that sense bandwidth is far more > expensive than spectral resolution. We've put some effort into accurately > quantifying the utilization, at least as far as multipliers and adders are > concerned, and are expanding this analysis to block ram and other resources. > And demux factor is typically radix 2, so it is very much quantized. >
Some thoughts on resource usage with the CASPER pfb_fir (for large FFT sizes, i.e not doing straight butterfly); complex multiplier usage; - scales linearly with the demux factor (often bandwidth) - scales linearly with number of FIR taps - is not affected by the FFT size adder usage (the final adder tree); - scales by nlogn with the demux factor. Will dominate adder usage for large demux factors - scales by nlogn with the number of FIR taps - is not affected by the FFT size BRAM usage; - scales linearly with demux factor but should not be affected (barring constraints set by underlying hardware). (BRAMs are currently not used efficiently - a separate set of coefficient and data storage BRAMs is not needed for each data input. The storage requirements should be completely dependent on FFT size and number of FIR taps). - scales linearly with the number of FIR taps. The current design could be improved so that BRAMs are more efficiently used though. - scales linearly with FFT size. Routing constraints; The design is simple, highly pipelined (almost no feedback) with very low fanout. Major constraints are BRAM to DSP slice, DSP slice to DSP slice and rounding, all of which are parameterised. Optimisations possible; The efficiency of BRAM use can be improved with some small logic savings. Resource usage in the CASPER FFT (when using the biplex FFT (eg fft_wideband_real and fft for 'large' FFTs); complex multiplier usage; - dominated by (n/2)*log2n (n = demux factor) needed in fft_direct for large FFTs. - scales linearly with increase in FFT size. BRAM usage; - scales linearly with bandwidth for large FFTs if FFT size kept constant. - for constant (large) FFT size, unaffected by demux factor. Biplex cores shrink in length by one stage while doubling in number for each doubling in demux factor. - scales roughly like n^2 with increase in FFT size. Routing constraints; The FFT is highly pipelined with low fanout except for in the unscrambler (although some work has been done here and the unscrambler is now optional). Major constraints are BRAM to DSP slice, DSP slice to DSP slice and rounding, all of which are parameterised. Optimisations possible; Various optimisations are still possible; - Coefficients could be shared between twiddles, reducing the number of BRAMs required by the demux factor. This would be significant for large demux factor designs at the expense of some fanout. - BRAMs used for delaying data could be shared between input streams, saving some BRAMs at the expense of extra routing constraints. - As Dan has suggested, grow the bits in the FFT at each stage as needed to reduce logic (and BRAM) use and probably help timing. Care should be taken however, as data quality is directly related to the width of the data path through the FFT. As noted by Jason, please also remember that other constraints such as QDR SRAM and XAUI bandwidth needs to be considered when building such a large system. Dan's suggestion of FFX is worth considering. It is upgradeable, allowing the addition of newer, more capable boards as they come online until you end up with a simple FX correlator again. I would love to see a correlator like that in action. Regards Andrew

