Re: [casper] virtex5 arithmetic speed

Ryan Monroe Tue, 18 Sep 2012 18:37:09 -0700

Just a comment: It is actually pretty practical to adjust our FFTs torun with word-lengths of up to about 27 bits (the last stages would usedouble DSP resources). FFT lengths which would need this are notcompletely implausible on a ROACH 2, so feel free to speak up if theneed arises.


--Ryan



On 09/18/2012 12:45 AM, Alex Zahn wrote:

Thank you--that's very useful. I didn't know the DSP slices could do 5ns multiplies.

Ultimately what I'm what I'm getting at here is trying to estimate howmany filter taps I can reasonably support on a 5 ns clock, with newdata words arriving on every clock, questions of available chipresources aside.

If I understand this correctly, even with new data arriving on every 5ns clock, ROACH should (up to practical considerations) be able tooperate as many taps as can fit on the FPGA. Is this right?


-Alex

On Mon, Sep 17, 2012 at 11:45 PM, Jason Manley <[email protected]<mailto:[email protected]>> wrote:


    The latency through an FPGA will be high relative to a CPU/GPU,
    because the FPGA's clock rate is lower (1/200MHz=5ns). But these
    operations can be pipelined so that you can do a DSP operation on
    every clock cycle. ROACH 1 and ROACH 2 will both run at 200MHz
    very easily.

    Considering ROACH-1, it has 640 DSP slices and you can do up to an
    18 bit x 25 bit multiply in a single DSP slice. So you can do 640
    multiply (and/or addition operation) operations every 1/200MHz=5ns.

    But then you can also start using the 14720 slices for multipliers
    or adders so you can get many more operations per second. And
    then, if you're doing low resolution operations, you can fill the
    244 BRAMs with lookup tables and just lookup the product for a
    given input vector to do even more operations on every clock cycle.

    If you wanted to throw the whole FPGA at DSP operations, you could
    easily say that a ROACH-1 board is capable of over 2 TeraOps/s for
    4-bit operations (common in radio astronomy). But this is an
    unrealistic figure of merit because it ignores things like
    pipelining registers and data routing requirements, memory
    controllers and the like which would all be needed in a practical
    design.

    Jason

    On 18 Sep 2012, at 05:20, Alex Zahn wrote:

    > I've been browsing the xilinx literature, but I just can't seem
    to get any idea how long one can usually expect addition and
    multiplication operations to take. I realize this depends on a lot
    of factors in the design, but does anyone know if it's reasonable
    to multiply two 16 bit numbers in a single clock with a clock rate
    of 200 MHz? I would test this on my ROACH out to find out, but I'm
    away from lab for a while, and thus rendered rather helpless for
    the time being.
    >
    > Unrelated, is there any online documentation on the new snapshot
    block?
    >
    > -Alex Zahn

Re: [casper] virtex5 arithmetic speed

Reply via email to