Hey Andrew, thanks for the designs! I'll have to spend some time looking
them over later, there's some good stuff there.
Nice idea (I think the Goertzel algorithm is often used with this
technique?). I have considered this for the DDC, it allows almost
arbitrary frequency and phase resolution. The only cost is a fair amount
of multipliers. For most applications at the moment we are BRAM limited
so this is not a problem (the very wide bandwidth instruments might be
multiplier limited at some point). It would be good as an option to
trade off multipliers for BRAM.
I haven't seen the Goertzel algorithm before, but it looks like a great
idea for this: we might be able to produce a coefficient DDS in just two
DSPs!
For my applications, I'm *totally* DSP limited, but I agree that we
should try to cater to the greater CASPER community of course.
Coefficient reuse (as you describe between phases) would be nice (at the
cost of some register stages I guess).
The CASPER libraries *hemmorage* pipeline stages. A few more won't
hurt, and you'll be saving the RAM addressing logic. Not so bad.
I think the reuse of control logic, coefficients etc would potentially
be the biggest saver assuming wide bandwidth systems. Ideally the
compiler would do this for us implicitly, but in the meantime explicit
reuse with optional register stages to reduce fanout would be awesome.
You can change a setting on pipeline registers (and maybe other places
too) which allows it to do this. it's called "Implement using
behavioral HDL" in simulink, or "allow_register_retiming" in the xBlock
interface. I had a bad experience with it though: It'll try to optimize
EVERYTHING. Got two identical registers which you intend to place on
opposite sides of the chip? They're now the same register. In my
experience, the only good way to control the sharing (or lack thereof)
was to do it manually..... YMMV.
I've got another idea we can consider too. This one is farther away.
I'm building radix-4 versions of my FFTs (1/2 as much fabric, 85% as
much DSP and 100% as much coeff). Now, for radix 4, you get three
coefficient banks per butterfly stage, and while the sum total (#
coefficients stored) is the same, the coefficients are actually in trios
of (x^1; x^2; x^3 and an implicit x^0). You could, in principle, store
just the x^1 and square/cube it into x^2 and x^3. I haven't tried this
(just thought of it), so no idea regarding performance. In addition,
while Dan and I are working with JPL legal to get my library
open-sourced, it's looking pretty clear that I won't be able to share
the really new stuff, so you'd have to do radix-4 on your own :-(
--Ryan
On 01/22/2013 04:41 AM, Andrew Martens wrote:
Hi all
It would work well for the PFB, but what we *really* need is a
solid "Direct Digital Synth (DDS) coefficient generator".
...
Nice idea (I think the Goertzel algorithm is often used with this
technique?). I have considered this for the DDC, it allows almost
arbitrary frequency and phase resolution. The only cost is a fair amount
of multipliers. For most applications at the moment we are BRAM limited
so this is not a problem (the very wide bandwidth instruments might be
multiplier limited at some point). It would be good as an option to
trade off multipliers for BRAM.
Coefficient reuse (as you describe between phases) would be nice (at the
cost of some register stages I guess).
I think the reuse of control logic, coefficients etc would potentially
be the biggest saver assuming wide bandwidth systems. Ideally the
compiler would do this for us implicitly, but in the meantime explicit
reuse with optional register stages to reduce fanout would be awesome.
PS, Sorry, I'm a bit busy right now so I can't implement a
coefficient interpolator for you guys right now. I'll write
back when I'm more free
Got a bit carried away and implemented one. Attached is a model that
allows the comparison between ideal, interpolator, and Dan's reduced
storage idea. The interpolator uses a multiplier, cruder versions might
not at the cost of noise and/or more logic.
PS2. I'm a bit anal about noise performance so I usually use
a couple more bits then Dan prescribes, but as he demonstrated
in the asic talks, his comments about bit widths are 100%
correct. I would recommend them as a general design practice
as well.
I have also seen papers that show that FFT performance is more dependent
on data path bit width than coefficient bit width. We need a proper
study on how many bits are required for different performance levels.
but for long
transforms, perhaps
>4K points or so,
then BRAM's might be
in short supply, and
then one could
consider storing fewer
coefficients (and also
taking advantage
of sin/cos and mirror
symmetries, which
don't degrade SNR at
all).
Did some work a while back. Attached is a model (sincos_upgrade.mdl)
that implements BRAM saving in different ways when generating FFT
twiddle factors (or DDC coefficients);
1. For very small numbers of coefficients, store them in the same word
(can output up to 36 bits from a BRAM so can store 18 bit sin and cos
values next to each other in the same word) so that we use 1 instead of
(current) 2 BRAMs. (see sincos_single_bram in the design)
2. Store only a quarter of a sinusoid and generate the complex
exponential via clever address generation and inversion of the output.
This uses 1 BRAM instead of (current, assuming a 'large' FFT) 8 at the
cost of logic (and multipliers) (see sincos_min_ram in the design)
3. Store half a sinusoid and generate the complex exponential via clever
address generation. Uses 1 BRAM instead of the (current, assuming a
'large' FFT) 4 at the cost of some logic. (see sincos_med_ram in the
design).
The interpolator could be integrated into these to use even less BRAM.
I will upgrade the library at some point this year to include these (and
the interpolator).
I did some
tests this
morning with a
simple moving
average filter
to turn 256
BRAM
coefficients
into 1024 (see
attached model
file), and it
looks pretty
promising:
errors are a
max of about
2.5%.
Could you send me this file? I would like to see how you did your
interpolation.
Regards
Andrew