Just a comment: It is actually pretty practical to adjust our FFTs to
run with word-lengths of up to about 27 bits (the last stages would use
double DSP resources). FFT lengths which would need this are not
completely implausible on a ROACH 2, so feel free to speak up if the
need arises.
--Ryan
On 09/18/2012 12:45 AM, Alex Zahn wrote:
Thank you--that's very useful. I didn't know the DSP slices could do 5
ns multiplies.
Ultimately what I'm what I'm getting at here is trying to estimate how
many filter taps I can reasonably support on a 5 ns clock, with new
data words arriving on every clock, questions of available chip
resources aside.
If I understand this correctly, even with new data arriving on every 5
ns clock, ROACH should (up to practical considerations) be able to
operate as many taps as can fit on the FPGA. Is this right?
-Alex
On Mon, Sep 17, 2012 at 11:45 PM, Jason Manley <[email protected]
<mailto:[email protected]>> wrote:
The latency through an FPGA will be high relative to a CPU/GPU,
because the FPGA's clock rate is lower (1/200MHz=5ns). But these
operations can be pipelined so that you can do a DSP operation on
every clock cycle. ROACH 1 and ROACH 2 will both run at 200MHz
very easily.
Considering ROACH-1, it has 640 DSP slices and you can do up to an
18 bit x 25 bit multiply in a single DSP slice. So you can do 640
multiply (and/or addition operation) operations every 1/200MHz=5ns.
But then you can also start using the 14720 slices for multipliers
or adders so you can get many more operations per second. And
then, if you're doing low resolution operations, you can fill the
244 BRAMs with lookup tables and just lookup the product for a
given input vector to do even more operations on every clock cycle.
If you wanted to throw the whole FPGA at DSP operations, you could
easily say that a ROACH-1 board is capable of over 2 TeraOps/s for
4-bit operations (common in radio astronomy). But this is an
unrealistic figure of merit because it ignores things like
pipelining registers and data routing requirements, memory
controllers and the like which would all be needed in a practical
design.
Jason
On 18 Sep 2012, at 05:20, Alex Zahn wrote:
> I've been browsing the xilinx literature, but I just can't seem
to get any idea how long one can usually expect addition and
multiplication operations to take. I realize this depends on a lot
of factors in the design, but does anyone know if it's reasonable
to multiply two 16 bit numbers in a single clock with a clock rate
of 200 MHz? I would test this on my ROACH out to find out, but I'm
away from lab for a while, and thus rendered rather helpless for
the time being.
>
> Unrelated, is there any online documentation on the new snapshot
block?
>
> -Alex Zahn