Just a comment: It is actually pretty practical to adjust our FFTs to run with word-lengths of up to about 27 bits (the last stages would use double DSP resources). FFT lengths which would need this are not completely implausible on a ROACH 2, so feel free to speak up if the need arises.

--Ryan


On 09/18/2012 12:45 AM, Alex Zahn wrote:
Thank you--that's very useful. I didn't know the DSP slices could do 5 ns multiplies.

Ultimately what I'm what I'm getting at here is trying to estimate how many filter taps I can reasonably support on a 5 ns clock, with new data words arriving on every clock, questions of available chip resources aside.

If I understand this correctly, even with new data arriving on every 5 ns clock, ROACH should (up to practical considerations) be able to operate as many taps as can fit on the FPGA. Is this right?

-Alex

On Mon, Sep 17, 2012 at 11:45 PM, Jason Manley <[email protected] <mailto:[email protected]>> wrote:

    The latency through an FPGA will be high relative to a CPU/GPU,
    because the FPGA's clock rate is lower (1/200MHz=5ns). But these
    operations can be pipelined so that you can do a DSP operation on
    every clock cycle. ROACH 1 and ROACH 2 will both run at 200MHz
    very easily.

    Considering ROACH-1, it has 640 DSP slices and you can do up to an
    18 bit x 25 bit multiply in a single DSP slice. So you can do 640
    multiply (and/or addition operation) operations every 1/200MHz=5ns.

    But then you can also start using the 14720 slices for multipliers
    or adders so you can get many more operations per second. And
    then, if you're doing low resolution operations, you can fill the
    244 BRAMs with lookup tables and just lookup the product for a
    given input vector to do even more operations on every clock cycle.

    If you wanted to throw the whole FPGA at DSP operations, you could
    easily say that a ROACH-1 board is capable of over 2 TeraOps/s for
    4-bit operations (common in radio astronomy). But this is an
    unrealistic figure of merit because it ignores things like
    pipelining registers and data routing requirements, memory
    controllers and the like which would all be needed in a practical
    design.

    Jason

    On 18 Sep 2012, at 05:20, Alex Zahn wrote:

    > I've been browsing the xilinx literature, but I just can't seem
    to get any idea how long one can usually expect addition and
    multiplication operations to take. I realize this depends on a lot
    of factors in the design, but does anyone know if it's reasonable
    to multiply two 16 bit numbers in a single clock with a clock rate
    of 200 MHz? I would test this on my ROACH out to find out, but I'm
    away from lab for a while, and thus rendered rather helpless for
    the time being.
    >
    > Unrelated, is there any online documentation on the new snapshot
    block?
    >
    > -Alex Zahn



Reply via email to