On 2. it seems to me that if we are digitizing a 9 GHz and using 20
Gsps, one still needs substantial demux (at least 64) no matter how
small the PFB.
As Sura points out this is far in excess of practical limits. This
stacks with what we have found: BW is the difficult part, large PFB
for high res less so.
hi jonathan,
i agree you need to demux 20 Gsps by 64 or 128, but i don't think this
will be a problem.
20 Gsps should fit pretty easily into an FPGA an FFX correlator:
in my example of the FFX, you'd need to implement an 8 point PFB
on the first FPGA to break the 10 GHz band into 8 sub-bands.
let's assume you do demux of 64, and clock the FPGA at 312.5 MHz:
you'd need 64*8 multipliers to implement the FIR part of an 8 tap PFB.
and 64 * 16 multipliers to implement the real to complex FFT part of the
PFB.
all the multipliers have fixed coefficients - no need to use block rams to
store coefficients - no block rams are needed for delays or
coefficients, as you'd
implement the butterfly diagram directly.
so there's no coefficient routing, but there is data routing.
the data paths can all be 8 bit, and you can add pipeline registers
where needed, so you should be able to get to 312.5 MHz.
if you can't get the FPGA to route at 312.5 MHz, then you'd have
to demux by 128, and you'd need twice as many multipliers.
(instead of 1536 multipliers, it would take 3072 multipliers).
you can use block rams for many of the multipliers, as most of the
computations are multiplying 8 bit data by a fixed coefficient,
so an 8 input, 8 output look up table is all you need.
if you don't want to implement a an 8 channel PFB,
you could also implement this as eight DDC's running in parallel
from the same ADC data, each DDC with a different downmix frequency.
the mixer coefficients are fixed, and many of the coefficients are 0, 1,
-1.
the DDC"s low pass filter coefficients are fixed as well - you can use
look up tables for the
low pass filters multipliers and the mixer multipliers if you are short
on DSP48's.
best wishes,
dan
BTW I realize as I write that my 6 GHz BW demux 32 case suggested in
response to Suraj still requires > 400 MHz FPGA clock, thus not so
practical. Can one gain a factor of 2 in demux doing quadrature
sampling, and having I and Q inputs to a complex input PFB each at 1/2
the rate?
Jonathan
On Dec 23, 2010, at 5:24 PM, Dan Werthimer wrote:
hi jonathan,
some ideas for your correlator:
1)
300 MHz is a good target, especially for V6.
suraj has shown how to achieve 375 MHz for V5
by using floor planning and auto-placing.
suraj or i can send you his draft paper on this if you'd like.
2)
you might want to consider FFX instead of FX:
eg: digitizing your 9 GHz band and using a PFB to break it up into
eight sub-bands
of 1.25 GHz each, and then sending the sub-bands into eight 1.25 GHz
FX correlators. this will simplify your switch requirements and
each correlator
now has only 4K channels, which is better suited for cornering turn
in a roach II.
3)
also, be sure to use billy's latest FFT, (recently checked in),
which moves all the adders and multipliers into DSP48's makes routing
easier.
you should also consider bit growth FFT's and PFB's, which start
out with the 4 or 5 or 8 bits from your ADC, and add bits gradually
as you move the frequency domain. dave mcmahon and hong chen
have done work on this.
best wishes,
dan
On 12/23/2010 1:47 PM, Jonathan Weintroub wrote:
Hi CASPERites,
Here's a somewhat fluffy RFI which I hope might start a little
thought and/or discussion over the season (acknowledging that not
all in the global collaboration celebrate the traditional Western
winter holidays):
At SMA we are looking into the use of CASPER methods to build a
ultra wideband high spectral resolution correlator. Typical specs
are, say, 18 GHz bandwidth with roughly 300 KHz spectral resolution,
by two polarizations, full Stokes. We are considering using a
standard CASPER packetized FX architecture (FX much better for high
res than XF), but in the relatively unexplored "small number of
antennas, wide bandwidth" regime. If the entire 18 GHz were eaten
by one ADC, this would require a sample rate of 40 Gsps and 64
kpoint PFB. Perhaps more reasonable would be two 9 GHz BW blocks
and a 32 k PFB sampled at about 20 Gsps, or three 6 GHz / 16 or 32 k
PFB / 14 Gsps.
To start we are looking closely at the FPGA resource utilization of
large PFBs. Something that probably is common knowledge amongst
those experienced in FX correlator design is that the demux factor
drives the utilization much faster than the size of the PFB. In
that sense bandwidth is far more expensive than spectral
resolution. We've put some effort into accurately quantifying the
utilization, at least as far as multipliers and adders are
concerned, and are expanding this analysis to block ram and other
resources. And demux factor is typically radix 2, so it is very
much quantized.
For example at 20 Gsps one might consider a demux factor of 128
resulting in an FPGA clock rate of 156 MHz, which is quite
comfortable for the FPGA. Alternatively a demux factor of 64 with
corresponding FPGA clock of twice that, or over 300 MHz.
Traditionally a rather uncomfortable regime for CASPER (we're
unusual, I believe, in running iBOBs at 256 MHz for the VLBI phased
array). The trouble is our analysis shows that the difference
between these two demux setting in the size of PFB one can fit in a
Virtex 6 is really quite large, and 128 definitely won't allow us to
do what we need to do.
So we are increasingly highly motivated to run the FPGAs faster
still. Just a 20% increment from the 256 MHz which we currently
view as a practical upper limit allows us to cross a clock rate
threshold which then enables a factor of two decrease in demux
factor, and consequent even larger increment in the realizable PFB
size.
Which is just a long winded way of asking if there are any others in
the collaboration motivated to run the FPGAs faster, and whether any
tricks can be shared? In particular, does the CASPER toolflow
support multiple clock domains? Our understanding is not yet, but
that's based on incomplete information. We know that there exists
Virtex 5 (?) IP FFT cores which supposably run at greater than 500
MHz rates, using the enhanced interconnect between DSP slices.
While on this topic of high demux factors, the tool flow largely
chokes on demux factors of 32 or greater. Any tips here would also
be appreciated.
If anyone can cast light on this general topic and related concerns
it would be very much appreciated.
Jonathan Weintroub
SAO