On Tue, Mar 10, 2020 at 8:36 AM Spencer Russell <s...@media.mit.edu> wrote:

>
> The point I'm making here is that overlap-add fast FIR is a special case
> of STFT-domain multiplication and resynthesis. I'm defining the standard
> STFT pipeline here as:
>
> 1. slice your signal into frames
> 2. pointwise-multiply an analysis window by each frame
> 3. perform `rfft` on each frame to give the STFT domain representation
> 4. modify the STFT representation
> 5. perform `irfft` on each frame
> 6. pointwise-multiply a synthesis window on each frame
> 7. overlap-add each frame to get the resulting time-domain signal
>

I don't think there is a precise definition of STFT, but IMO this is too
generic. The fundamental design parameters for an STFT system are the
window shapes and overlaps, but in fast convolution those degrees of
freedom are eliminated entirely.

The reason this distinction is important is that STFT is for cases where
you want to estimate the response in the frequency domain. If you can't
apply a useful analysis window, then there isn't much point.

If you already have your response expressed as length-K FIRs in the time
domain, then you don't need STFT. You just apply the FIRs directly (using
fast convolution if appropriate). STFT is not an attractive topology just
for implementing time-varying FIR, as such.


>
> This is just to make the point that fast FIR is a special case of STFT
> processing.


So, a useful distinction here is "oversampled filterbanks". The way that
fast convolution works is through oversampling/zero-padding. This creates
margin in the time domain to accommodate aliasing. You can apply
oversampling - in any amount - to an STFT system to reduce the aliasing at
the cost of increased overhead. Fast convolution is sort of a corner case,
where you can eliminate the aliasing entirely with finite oversampling, at
the cost of losing your analysis and synthesis windowing (and introducing
extra spill-over in the synthesis).


> > Right, but if you are using length K FFT and zero-padding by K-1, then
> > the hop size is 1 sample and there are no windows.
>
> Whoops, this was dumb on my part. I was not referring to a hop size of 1!
> Hopefully my explanation above is more clear.
>

But, the reason I made this point is you specified that the FFTs are length
K. If you are using length N+K FFTs, then the estimated response is length
N+K, and we are back to the original problem of ensuring that a DFT
response is appropriately time limited.

This can be done by brute force of course: length N+K IFFT => length K
window => length N+K FFT. But that is the same complexity as the STFT
signal path! So, we're back to smoothing/conditioning in the frequency
domain, windowing in the time domain, and accepting some engineered level
of time domain aliasing. The difference is that the oversampling (N+K vs N)
has given us additional margin to accommodate it (i.e., we can tolerate
sharper filters).

Fast convolution is for cases where filter design is done offline: then all
those filter computations can be done ahead of time,  there is no aliasing,
and everything is great! But if the filters get estimated at runtime then
you run into prohibitive costs. Since STFT is the latter case, in practice
fast convolution isn't much help. The two approaches are orthogonal in that
sense, despite their structural similarities.


>
> You could think of STFT multiplication as applying a different FIR filter
> to each frame and then cross-fading between them, which is clearly not the
> same as continually varying the FIR parameters in the time domain.


By linearity, cross-fading between two fixed FIRs is equivalent to applying
a single time-varying FIR with the coefficients cross-faded the same way
(at least for direct form). The synthesis window defines this cross-fade
behavior for STFT.

It's still not exactly equivalent, because the STFT is a time-variant MIMO
system that does not admit an exact SISO equivalent. However, the
difference is really more that it creates aliasing (which should be very
low level if properly designed), and not that you can't make an FIR with
comparable response. It's not trivial, but it is relatively straightforward
to take an STFT response, consider the window used to obtain it, and spit
out a corresponding FIR. These can then be interpolated according to the
synthesis windows to operate a comparable time-varying SISO FIR system.

Such a system might actually be preferable to the STFT synthesis chain,
since it wouldn't create circular convolution artifacts in the first place.
However, it is expensive - you're already running the STFT analysis chain,
converting the filter to the time domain is *more* expensive than the STFT
synthesis, and then you still have to run the time-varying FIR on top of
that.


> They do seem to have a tight relationship though, and when we do STFT
> modifications it seems that in some contexts we're trying to approximate
> the time-varying FIR filter.
>

Yeah, multiplying DFTs is fundamentally an "FIR-like" operation, and much
of the intuition applies. But this is only approximate for systems like
STFT that do not have SISO FIR equivalents. Specifically, for a real FIR
you have spill-over effects from frame to frame (which is what is accounted
for in fast convolution). In STFT this only happens within frames, and the
"ring-out" at the end is mangled back to the starting (all of which is
suppressed by synthesis windows).

This shouldn't be noticeable as long as the STFT design is kept within
bounds. But if you push to hard on it, the signal-to-alias ratio will
decline and the difference between it and a SISO FIR will be readily
apparent (in a bad way!).


>  Can you clarify what you mean by a form of aliasing? As mentioned above,
> with proper zero-padding there should be no time-aliasing introduced.


I mean that in general with STFT systems there is non-zero time domain
aliasing. This does not apply to fast convolution (up to finite word length
effects, as always).

Ethan
_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Reply via email to