On 2. it seems to me that if we are digitizing a 9 GHz and using 20 Gsps, one still needs substantial demux (at least 64) no matter how small the PFB. As Sura points out this is far in excess of practical limits. This stacks with what we have found: BW is the difficult part, large PFB for high res less so.


hi jonathan,

i agree you need to demux 20 Gsps by 64 or 128, but i don't think this will be a problem.
20 Gsps should fit pretty easily into an FPGA an FFX correlator:

in my example of the FFX, you'd need to implement an 8 point PFB
on the first FPGA to break the 10 GHz band into 8 sub-bands.
let's assume you do demux of 64, and clock the FPGA at 312.5 MHz:
you'd need 64*8 multipliers to implement the FIR part of an 8 tap PFB.
and 64 * 16 multipliers to implement the real to complex FFT part of the PFB.
all the multipliers have fixed coefficients  - no need to use block rams to
store coefficients - no block rams are needed for delays or coefficients, as you'd
implement the butterfly diagram directly.

so there's no coefficient routing, but there is data routing.
the data paths can all be 8 bit, and you can add pipeline registers
where needed, so you should be able to get to 312.5 MHz.

if you can't get the FPGA to route at 312.5 MHz, then you'd have
to demux by 128, and you'd need twice as many multipliers.
(instead of 1536 multipliers, it would take 3072 multipliers).
you can use block rams for many of the multipliers, as most of the
computations are multiplying 8 bit data by a fixed coefficient,
so an 8 input, 8 output look up table is all you need.

if you don't want to implement a an 8 channel PFB,
you could also implement this as eight DDC's running in parallel
from the same ADC data, each DDC with a different downmix frequency.
the mixer coefficients are fixed, and many of the coefficients are 0, 1, -1. the DDC"s low pass filter coefficients are fixed as well - you can use look up tables for the low pass filters multipliers and the mixer multipliers if you are short on DSP48's.

best wishes,

dan



BTW I realize as I write that my 6 GHz BW demux 32 case suggested in response to Suraj still requires > 400 MHz FPGA clock, thus not so practical. Can one gain a factor of 2 in demux doing quadrature sampling, and having I and Q inputs to a complex input PFB each at 1/2 the rate?

Jonathan


On Dec 23, 2010, at 5:24 PM, Dan Werthimer wrote:



hi jonathan,

some ideas for your correlator:

1)
300 MHz is a good target, especially for V6.
suraj has shown how to  achieve 375 MHz for V5
by using floor planning and auto-placing.
suraj or i can send you his draft paper on this if you'd like.

2)
you might want to consider FFX instead of FX:
eg: digitizing your 9 GHz band and using a PFB to break it up into eight sub-bands
of 1.25 GHz each, and then sending the sub-bands into eight 1.25 GHz
FX correlators. this will simplify your switch requirements and each correlator now has only 4K channels, which is better suited for cornering turn in a roach II.

3)
also, be sure to use billy's latest FFT, (recently checked in),
which moves all the adders and multipliers into DSP48's makes routing easier.
you should also consider bit growth FFT's and PFB's, which start
out with the 4 or 5 or 8 bits from your ADC, and add bits gradually
as you move the frequency domain.   dave mcmahon and hong chen
have done work on this.

best wishes,

dan

On 12/23/2010 1:47 PM, Jonathan Weintroub wrote:
Hi CASPERites,

Here's a somewhat fluffy RFI which I hope might start a little thought and/or discussion over the season (acknowledging that not all in the global collaboration celebrate the traditional Western winter holidays):

At SMA we are looking into the use of CASPER methods to build a ultra wideband high spectral resolution correlator. Typical specs are, say, 18 GHz bandwidth with roughly 300 KHz spectral resolution, by two polarizations, full Stokes. We are considering using a standard CASPER packetized FX architecture (FX much better for high res than XF), but in the relatively unexplored "small number of antennas, wide bandwidth" regime. If the entire 18 GHz were eaten by one ADC, this would require a sample rate of 40 Gsps and 64 kpoint PFB. Perhaps more reasonable would be two 9 GHz BW blocks and a 32 k PFB sampled at about 20 Gsps, or three 6 GHz / 16 or 32 k PFB / 14 Gsps.

To start we are looking closely at the FPGA resource utilization of large PFBs. Something that probably is common knowledge amongst those experienced in FX correlator design is that the demux factor drives the utilization much faster than the size of the PFB. In that sense bandwidth is far more expensive than spectral resolution. We've put some effort into accurately quantifying the utilization, at least as far as multipliers and adders are concerned, and are expanding this analysis to block ram and other resources. And demux factor is typically radix 2, so it is very much quantized.

For example at 20 Gsps one might consider a demux factor of 128 resulting in an FPGA clock rate of 156 MHz, which is quite comfortable for the FPGA. Alternatively a demux factor of 64 with corresponding FPGA clock of twice that, or over 300 MHz. Traditionally a rather uncomfortable regime for CASPER (we're unusual, I believe, in running iBOBs at 256 MHz for the VLBI phased array). The trouble is our analysis shows that the difference between these two demux setting in the size of PFB one can fit in a Virtex 6 is really quite large, and 128 definitely won't allow us to do what we need to do.

So we are increasingly highly motivated to run the FPGAs faster still. Just a 20% increment from the 256 MHz which we currently view as a practical upper limit allows us to cross a clock rate threshold which then enables a factor of two decrease in demux factor, and consequent even larger increment in the realizable PFB size.

Which is just a long winded way of asking if there are any others in the collaboration motivated to run the FPGAs faster, and whether any tricks can be shared? In particular, does the CASPER toolflow support multiple clock domains? Our understanding is not yet, but that's based on incomplete information. We know that there exists Virtex 5 (?) IP FFT cores which supposably run at greater than 500 MHz rates, using the enhanced interconnect between DSP slices.

While on this topic of high demux factors, the tool flow largely chokes on demux factors of 32 or greater. Any tips here would also be appreciated.

If anyone can cast light on this general topic and related concerns it would be very much appreciated.

Jonathan Weintroub
SAO









Reply via email to