Hi Robert On Wed, Mar 11, 2020 at 4:19 PM robert bristow-johnson < r...@audioimagination.com> wrote:
> > i don't think it's too generic for "STFT processing". step #4 is pretty > generic. > I think the part that chafes my intuition is more that the windows in steps #2 and #6 should "match" in some way, and obey an appropriate perfect reconstruction condition. I think of STFT as intentionally wiping out any spill-over effects between frames with synthesis windowing, to impose a particular time-frequency tiling. Whereas fast convolution is defined by how it explicitly accounts for spill-over between frames. My intuition isn't definitive, but that's what comes to mind. In any case, "STFT processing" is a very generic term. > > here is my attempt to quantitatively define and describe the STFT: > > > https://dsp.stackexchange.com/questions/45625/is-windowed-fourier-transform-a-synonym-for-stft/45631#45631 > Cool, that's a helpful reference for this stuff. In terms of "what even is STFT", it seems there is more consensus on the analysis part. Many STFT applications don't involve any synthesis or filtering, but only frequency domain parameter estimation. For analysis-only, probably everyone agrees that STFT consists of some Constant OverLap Add (COLA) window scheme, followed by DFT. Rectangular windows are a perfectly valid choice here, albeit one with poor sidelobe suppression. Note that there are two potential layers of oversampling available: one from overlapped windows, and another from zero-padding. To summarize my understanding of your earlier remarks, the situation gets fuzzier for synthesis. Broadly, there are two basic approaches. One is to keep the COLA analysis and use raw (unwindowed) overlap-add for synthesis. The other is to add synthesis windows, in which case the PR condition becomes COLA on the product of the analysis and synthesis windows (I'd call this "STFT filter bank" or maybe "FFT phase vocoder" depending on the audience/application). The first approach has immediate problems if the DFT values are modified, because the COLA condition is not enforced on the output. For the special case that the modification is multiplication by a DFT kernel that corresponds to a length-K FIR filter, this can be accommodated by zero-padding type oversampling, which results in the Overlap-Add flavor of fast convolution to account for the inter frame effects. Note that this implicitly extends the (raw) overlap-add region in synthesis accordingly - the analysis windows obey COLA, but the synthesis "windows" have different support and are not part of the PR condition. As you point out, this works for any COLA analysis window scheme, not just rectangular, although the efficiency is correspondingly reduced with overlap. This system is equivalent to a SISO FIR, up to finite word length effects. Note that this equivalence happens because we are adding an additional time-variant stage (zero-padding/raw OLA), to explicitly correct for the time-variant effects of the underlying DFT operation. This is the block processing analog of upsampling a scalar signal by K so that we can apply an order-K polynomial nonlinearity without aliasing. The synthesis window approach is more general in the types of modifications that can be accommodated (spectral subtraction, nonlinear operations, etc.). This is because it allows time domain aliasing to occur, but explicitly suppresses it by attenuating the frame edges. This is also throwing oversampling at the problem, but of the overlap type instead of the zero-padding type. You can also apply zero-padding on top of synthesis windows to further increase the margin for circular aliasing. However unlike fast convolution you would still apply the synthesis windows to remove spill-over between frames and not use raw OLA. This is required for the filterbank PR condition. There is no equivalent SISO system in this case. The level of aliasing is determined by how hard you push on the response, and how much overlap/zero-padding you can afford. I.e., it's ultimately engineered/tuned rather than designed out explicitly as in fast convolution. We're all on the same page on this stuff, I hope? Ethan
_______________________________________________ dupswapdrop: music-dsp mailing list music-dsp@music.columbia.edu https://lists.columbia.edu/mailman/listinfo/music-dsp