I think we're mostly on the same page, Ethan. Though even with STFT-domain time-variant filtering (such as with noise reduction, or mask-based source separation) it would seem you could still zero-pad each input frame to eliminate any issues due to time-aliasing. As you mention (paraphrasing), you can smooth out the mask which will reduce the amount of zero-padding you need, but if you have an KxN STFT (K frequency components and N frames) then then zero-padding each frame by K-1 should still eliminate any time-aliasing even if your filter has hard edges in the frequency domain, right?
I understand the role of time-domain windowing in STFT processing to be mostly: 1. Reduce frequency-domain ripple (side-lobes in each band) 2. Provide a sort of cross-fade from frame-to-frame to smooth out framing effects In my mind doing STFT-domain masking/filtering is _roughly_ equivalent to a filter bank with time-varying response. In the STFT case though you're keeping things invariant within each frame and then cross-fading from frame to frame. This is a pretty intuitive/ad-hoc way of thinking on my part though - I'd love to see some literature that gives a more formal treatment. -s On Mon, Mar 9, 2020, at 12:52 AM, Ethan Duni wrote: > > > On Sun, Mar 8, 2020 at 8:02 PM Spencer Russell <s...@media.mit.edu> wrote: >> In fact, the the standard STFT analysis/synthesis pipeline is the same thing >> as overlap-add "fast convolution" if you: >> >> 1. Use a rectangular window with a length equal to your hop size >> 2. zero-pad each input frame by the length of your FIR kernel minus 1 > > Indeed, the two ideas are closely related and can be combined. It's more a > difference in the larger approach. > > If you can specify the desired response in terms of an FIR of some fixed > length, then you can account for the circular effects and use fast FIR. Note > that this is a time-variant MIMO system constructed to be equivalent to a > time-invariant SISO system (modulo finite word length effects, as you say). > > Alternatively, the desired response can be specified in the STFT domain. This > comes up naturally in situations where it is estimated in the frequency > domain to begin with, such as noise suppression or channel equalization. > Then, circular convolution effects are controlled through a combination of > pre/post windowing and smoothing/conditioning of the frequency response. > Unlike the fast FIR case, the time-variant effects are only approximately > suppressed: this is a time-variant MIMO system that is *not* equivalent to > any time-invariant SISO system. > > So there is an extra layer of engineering needed in STFT systems to ensure > that time domain aliasing is adequately suppressed. With fast FIR, you just > calculate the correct size to zero-pad (or delete), and then there is no > aliasing to worry about. > > Ethan > _______________________________________________ > dupswapdrop: music-dsp mailing list > music-dsp@music.columbia.edu > https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________ dupswapdrop: music-dsp mailing list music-dsp@music.columbia.edu https://lists.columbia.edu/mailman/listinfo/music-dsp