Re: [casper] wideband conversion and correlation

Jonathan Weintroub Fri, 24 Dec 2010 05:39:00 -0800

Dear all who responded,

First, I apologize for inadvertently cc'ing the entire list with amessage to my internal team. A consequence of using autocomplete inthe cc field to make sure I got the list address right. Thankfully Ithink I only said nice things ;)

Second, I really appreciate all the responses which are veryenlightening. I have little time to read carefully, and less time torespond, as I leave with my family to Cape Town this morning, anddon't expect to surface for a good few days. I do look forward toconnecting with the SA SKA/KAT group, probably in January to discussthis and other things in person. Perhaps the discussion will continuenonetheless.


A few quick comments, based only on a scan of the responses.

--the SMA is an 8 antenna array, with two active receivers perantenna. In particular might be dual pol, thus 16 "ant-pols".

--we are certainly open to distributing the processing in the mannersuggested by Dan, Mel, and possibly others. Even in such a scheme,though, an understanding of PFB fit and limits, and, related,increasing clock rates to improve performance is warranted. We arealso open to not packetizing (on-board corner turn).

--I made mention of 500 MHz FFT cores, those were advertised byindustry DSP specialists we have had discussions with. Not designedwith CASPER methods. Multiple clock domains are required, andperhaps we could "black box" one of these cores. I don't think anyonehas commented on multiple clock domains in CASPER yet, Billy, anyone?(may have missed it on scanning).

--We need to understand memory util, including bram, qdr, ddr, amountand bandwidth. Will read your comments carefully.

--Andrew, our finding is that *both* multipliers and adders scale asDlog2D (other terms, but this one dominates). If N is the size of thePFB they scale only as logN (I may mis-remember if this is dominantterm). I don't understand the implication of the condition "(forlarge FFT sizes, i.e not doing straight butterfly)" I would very muchlike to discuss all of this with you, and others who might beinterested, in CT if possible.

--Dan your statement that D=64 or 128 would be possible is veryencouraging, but appears to contradict what Suraj said. Would verymuch like to resolve this.

Thanks to all who contributed. In a huge rush, please excuse mis-statement or typos, or questions on matters already addressed. MerryChristmas to those who celebrate it. And look forward to picking upthis thread again.


Jonathan


On Dec 24, 2010, at 4:34 AM, Andrew Martens wrote:

Hi Jonathan
To start we are looking closely at the FPGA resource utilization oflarge PFBs. Something that probably is common knowledge amongstthose experienced in FX correlator design is that the demux factordrives the utilization much faster than the size of the PFB. Inthat sense bandwidth is far more expensive than spectralresolution. We've put some effort into accurately quantifying theutilization, at least as far as multipliers and adders areconcerned, and are expanding this analysis to block ram and otherresources. And demux factor is typically radix 2, so it is verymuch quantized.
Some thoughts on resource usage with the CASPER pfb_fir (for largeFFT sizes, i.e not doing straight butterfly);
complex multiplier usage;
  - scales linearly with the demux factor (often bandwidth)
  - scales linearly with number of FIR taps
  - is not affected by the FFT size

adder usage (the final adder tree);
- scales by nlogn with the demux factor. Will dominate adder usagefor large demux factors
  - scales by nlogn with the number of FIR taps
  - is not affected by the FFT size

BRAM usage;
- scales linearly with demux factor but should not be affected(barring constraints set by underlying hardware). (BRAMs arecurrently not used efficiently - a separate set of coefficient anddata storage BRAMs is not needed for each data input. The storagerequirements should be completely dependent on FFT size and numberof FIR taps).- scales linearly with the number of FIR taps. The current designcould be improved so that BRAMs are more efficiently used though.
  - scales linearly with FFT size.

Routing constraints;
The design is simple, highly pipelined (almost no feedback) withvery low fanout. Major constraints are BRAM to DSP slice, DSP sliceto DSP slice and rounding, all of which are parameterised.
Optimisations possible;
The efficiency of BRAM use can be improved with some small logicsavings.
Resource usage in the CASPER FFT (when using the biplex FFT (egfft_wideband_real and fft for 'large' FFTs);
complex multiplier usage;
- dominated by (n/2)*log2n (n = demux factor) needed in fft_directfor large FFTs.
  - scales linearly with increase in FFT size.

BRAM usage;
- scales linearly with bandwidth for large FFTs if FFT size keptconstant.- for constant (large) FFT size, unaffected by demux factor.Biplex cores shrink in length by one stage while doubling in numberfor each doubling in demux factor.
  - scales roughly like n^2 with increase in FFT size.

Routing constraints;
The FFT is highly pipelined with low fanout except for in theunscrambler (although some work has been done here and theunscrambler is now optional). Major constraints are BRAM to DSPslice, DSP slice to DSP slice and rounding, all of which areparameterised.
Optimisations possible;
Various optimisations are still possible;
- Coefficients could be shared between twiddles, reducing thenumber of BRAMs required by the demux factor. This would besignificant for large demux factor designs at the expense of somefanout.- BRAMs used for delaying data could be shared between inputstreams, saving some BRAMs at the expense of extra routingconstraints.- As Dan has suggested, grow the bits in the FFT at each stage asneeded to reduce logic (and BRAM) use and probably help timing. Careshould be taken however, as data quality is directly related to thewidth of the data path through the FFT.
As noted by Jason, please also remember that other constraints suchas QDR SRAM and XAUI bandwidth needs to be considered when buildingsuch a large system.
Dan's suggestion of FFX is worth considering. It is upgradeable,allowing the addition of newer, more capable boards as they comeonline until you end up with a simple FX correlator again.
I would love to see a correlator like that in action.

Regards
Andrew

Re: [casper] wideband conversion and correlation

Reply via email to