Dear all who responded,

First, I apologize for inadvertently cc'ing the entire list with a message to my internal team. A consequence of using autocomplete in the cc field to make sure I got the list address right. Thankfully I think I only said nice things ;)

Second, I really appreciate all the responses which are very enlightening. I have little time to read carefully, and less time to respond, as I leave with my family to Cape Town this morning, and don't expect to surface for a good few days. I do look forward to connecting with the SA SKA/KAT group, probably in January to discuss this and other things in person. Perhaps the discussion will continue nonetheless.

A few quick comments, based only on a scan of the responses.

--the SMA is an 8 antenna array, with two active receivers per antenna. In particular might be dual pol, thus 16 "ant-pols".

--we are certainly open to distributing the processing in the manner suggested by Dan, Mel, and possibly others. Even in such a scheme, though, an understanding of PFB fit and limits, and, related, increasing clock rates to improve performance is warranted. We are also open to not packetizing (on-board corner turn).

--I made mention of 500 MHz FFT cores, those were advertised by industry DSP specialists we have had discussions with. Not designed with CASPER methods. Multiple clock domains are required, and perhaps we could "black box" one of these cores. I don't think anyone has commented on multiple clock domains in CASPER yet, Billy, anyone? (may have missed it on scanning).

--We need to understand memory util, including bram, qdr, ddr, amount and bandwidth. Will read your comments carefully.

--Andrew, our finding is that *both* multipliers and adders scale as Dlog2D (other terms, but this one dominates). If N is the size of the PFB they scale only as logN (I may mis-remember if this is dominant term). I don't understand the implication of the condition "(for large FFT sizes, i.e not doing straight butterfly)" I would very much like to discuss all of this with you, and others who might be interested, in CT if possible.

--Dan your statement that D=64 or 128 would be possible is very encouraging, but appears to contradict what Suraj said. Would very much like to resolve this.



Thanks to all who contributed. In a huge rush, please excuse mis- statement or typos, or questions on matters already addressed. Merry Christmas to those who celebrate it. And look forward to picking up this thread again.

Jonathan


On Dec 24, 2010, at 4:34 AM, Andrew Martens wrote:

Hi Jonathan

To start we are looking closely at the FPGA resource utilization of large PFBs. Something that probably is common knowledge amongst those experienced in FX correlator design is that the demux factor drives the utilization much faster than the size of the PFB. In that sense bandwidth is far more expensive than spectral resolution. We've put some effort into accurately quantifying the utilization, at least as far as multipliers and adders are concerned, and are expanding this analysis to block ram and other resources. And demux factor is typically radix 2, so it is very much quantized.

Some thoughts on resource usage with the CASPER pfb_fir (for large FFT sizes, i.e not doing straight butterfly);

complex multiplier usage;
  - scales linearly with the demux factor (often bandwidth)
  - scales linearly with number of FIR taps
  - is not affected by the FFT size

adder usage (the final adder tree);
- scales by nlogn with the demux factor. Will dominate adder usage for large demux factors
  - scales by nlogn with the number of FIR taps
  - is not affected by the FFT size

BRAM usage;
- scales linearly with demux factor but should not be affected (barring constraints set by underlying hardware). (BRAMs are currently not used efficiently - a separate set of coefficient and data storage BRAMs is not needed for each data input. The storage requirements should be completely dependent on FFT size and number of FIR taps). - scales linearly with the number of FIR taps. The current design could be improved so that BRAMs are more efficiently used though.
  - scales linearly with FFT size.

Routing constraints;
The design is simple, highly pipelined (almost no feedback) with very low fanout. Major constraints are BRAM to DSP slice, DSP slice to DSP slice and rounding, all of which are parameterised.

Optimisations possible;
The efficiency of BRAM use can be improved with some small logic savings.

Resource usage in the CASPER FFT (when using the biplex FFT (eg fft_wideband_real and fft for 'large' FFTs);

complex multiplier usage;
- dominated by (n/2)*log2n (n = demux factor) needed in fft_direct for large FFTs.
  - scales linearly with increase in FFT size.

BRAM usage;
- scales linearly with bandwidth for large FFTs if FFT size kept constant. - for constant (large) FFT size, unaffected by demux factor. Biplex cores shrink in length by one stage while doubling in number for each doubling in demux factor.
  - scales roughly like n^2 with increase in FFT size.

Routing constraints;
The FFT is highly pipelined with low fanout except for in the unscrambler (although some work has been done here and the unscrambler is now optional). Major constraints are BRAM to DSP slice, DSP slice to DSP slice and rounding, all of which are parameterised.

Optimisations possible;
Various optimisations are still possible;
- Coefficients could be shared between twiddles, reducing the number of BRAMs required by the demux factor. This would be significant for large demux factor designs at the expense of some fanout. - BRAMs used for delaying data could be shared between input streams, saving some BRAMs at the expense of extra routing constraints. - As Dan has suggested, grow the bits in the FFT at each stage as needed to reduce logic (and BRAM) use and probably help timing. Care should be taken however, as data quality is directly related to the width of the data path through the FFT.

As noted by Jason, please also remember that other constraints such as QDR SRAM and XAUI bandwidth needs to be considered when building such a large system.

Dan's suggestion of FFX is worth considering. It is upgradeable, allowing the addition of newer, more capable boards as they come online until you end up with a simple FX correlator again.

I would love to see a correlator like that in action.

Regards
Andrew




Reply via email to