I think we're mostly on the same page, Ethan. Though even with STFT-domain 
time-variant filtering (such as with noise reduction, or mask-based source 
separation) it would seem you could still zero-pad each input frame to 
eliminate any issues due to time-aliasing. As you mention (paraphrasing), you 
can smooth out the mask which will reduce the amount of zero-padding you need, 
but if you have an KxN STFT (K frequency components and N frames) then then 
zero-padding each frame by K-1 should still eliminate any time-aliasing even if 
your filter has hard edges in the frequency domain, right?

I understand the role of time-domain windowing in STFT processing to be mostly:
1. Reduce frequency-domain ripple (side-lobes in each band)
2. Provide a sort of cross-fade from frame-to-frame to smooth out framing 
effects

In my mind doing STFT-domain masking/filtering is _roughly_ equivalent to a 
filter bank with time-varying response. In the STFT case though you're keeping 
things invariant within each frame and then cross-fading from frame to frame. 
This is a pretty intuitive/ad-hoc way of thinking on my part though - I'd love 
to see some literature that gives a more formal treatment.

-s

On Mon, Mar 9, 2020, at 12:52 AM, Ethan Duni wrote:
> 
> 
> On Sun, Mar 8, 2020 at 8:02 PM Spencer Russell <s...@media.mit.edu> wrote:
>> In fact, the the standard STFT analysis/synthesis pipeline is the same thing 
>> as overlap-add "fast convolution" if you:
>> 
>> 1. Use a rectangular window with a length equal to your hop size
>> 2. zero-pad each input frame by the length of your FIR kernel minus 1
> 
> Indeed, the two ideas are closely related and can be combined. It's more a 
> difference in the larger approach. 
> 
> If you can specify the desired response in terms of an FIR of some fixed 
> length, then you can account for the circular effects and use fast FIR. Note 
> that this is a time-variant MIMO system constructed to be equivalent to a 
> time-invariant SISO system (modulo finite word length effects, as you say). 
> 
> Alternatively, the desired response can be specified in the STFT domain. This 
> comes up naturally in situations where it is estimated in the frequency 
> domain to begin with, such as noise suppression or channel equalization. 
> Then, circular convolution effects are controlled through a combination of 
> pre/post windowing and smoothing/conditioning of the frequency response. 
> Unlike the fast FIR case, the time-variant effects are only approximately 
> suppressed: this is a time-variant MIMO system that is *not* equivalent to 
> any time-invariant SISO system. 
> 
> So there is an extra layer of engineering needed in STFT systems to ensure 
> that time domain aliasing is adequately suppressed. With fast FIR, you just 
> calculate the correct size to zero-pad (or delete), and then there is no 
> aliasing to worry about. 
> 
> Ethan
> _______________________________________________
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Reply via email to