Hi Robert

On Wed, Mar 11, 2020 at 4:19 PM robert bristow-johnson <
r...@audioimagination.com> wrote:

>
> i don't think it's too generic for "STFT processing".  step #4 is pretty
> generic.
>

I think the part that chafes my intuition is more that the windows in steps
#2 and #6 should "match" in some way, and obey an appropriate perfect
reconstruction condition. I think of STFT as intentionally wiping out any
spill-over effects between frames with synthesis windowing, to impose a
particular time-frequency tiling. Whereas fast convolution is defined by
how it explicitly accounts for spill-over between frames.

My intuition isn't definitive, but that's what comes to mind. In any case,
"STFT processing" is a very generic term.


>
> here is my attempt to quantitatively define and describe the STFT:
>
>
> https://dsp.stackexchange.com/questions/45625/is-windowed-fourier-transform-a-synonym-for-stft/45631#45631
>


Cool, that's a helpful reference for this stuff.

In terms of "what even is STFT", it seems there is more consensus on the
analysis part. Many STFT applications don't involve any synthesis or
filtering, but only frequency domain parameter estimation. For
analysis-only, probably everyone agrees that STFT consists of some Constant
OverLap Add (COLA) window scheme, followed by DFT. Rectangular windows are
a perfectly valid choice here, albeit one with poor sidelobe suppression.
Note that there are two potential layers of oversampling available: one
from overlapped windows, and another from zero-padding.

To summarize my understanding of your earlier remarks, the situation gets
fuzzier for synthesis. Broadly, there are two basic approaches. One is to
keep the COLA analysis and use raw (unwindowed) overlap-add for synthesis.
The other is to add synthesis windows, in which case the PR condition
becomes COLA on the product of the analysis and synthesis windows (I'd call
this "STFT filter bank" or maybe "FFT phase vocoder" depending on the
audience/application).

The first approach has immediate problems if the DFT values are modified,
because the COLA condition is not enforced on the output. For the special
case that the modification is multiplication by a DFT kernel that
corresponds to a length-K FIR filter, this can be accommodated by
zero-padding type oversampling, which results in the Overlap-Add flavor of
fast convolution to account for the inter frame effects. Note that this
implicitly extends the (raw) overlap-add region in synthesis accordingly -
the analysis windows obey COLA, but the synthesis "windows" have different
support and are not part of the PR condition.

As you point out, this works for any COLA analysis window scheme, not just
rectangular, although the efficiency is correspondingly reduced with
overlap. This system is equivalent to a SISO FIR, up to finite word length
effects. Note that this equivalence happens because we are adding an
additional time-variant stage (zero-padding/raw OLA), to explicitly correct
for the time-variant effects of the underlying DFT operation. This is the
block processing analog of upsampling a scalar signal by K so that we can
apply an order-K polynomial nonlinearity without aliasing.

The synthesis window approach is more general in the types of modifications
that can be accommodated (spectral subtraction, nonlinear operations,
etc.). This is because it allows time domain aliasing to occur, but
explicitly suppresses it by attenuating the frame edges. This is also
throwing oversampling at the problem, but of the overlap type instead of
the zero-padding type.

You can also apply zero-padding on top of synthesis windows to further
increase the margin for circular aliasing. However unlike fast convolution
you would still apply the synthesis windows to remove spill-over between
frames and not use raw OLA. This is required for the filterbank PR
condition. There is no equivalent SISO system in this case. The level of
aliasing is determined by how hard you push on the response, and how much
overlap/zero-padding you can afford. I.e., it's ultimately engineered/tuned
rather than designed out explicitly as in fast convolution.

We're all on the same page on this stuff, I hope?

Ethan
_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Reply via email to