> On March 8, 2020 7:55 PM Ethan Duni <ethan.d...@gmail.com> wrote:
> 
> Fast FIR is a different thing than an FFT filter bank.
> 
> You can combine the two approaches but I don’t think that’s what is being 
> done here?

> On March 9, 2020 10:15 AM Spencer Russell <s...@media.mit.edu> wrote:
> 
> 
> I think we're mostly on the same page, Ethan.

well, i think that i am on the same page as Ethan.

> Though even with STFT-domain time-variant filtering (such as with noise 
> reduction, or mask-based source separation) it would seem you could still 
> zero-pad each input frame to eliminate any issues due to time-aliasing.

zero-padding is the sole technique that gets rid of time-aliasing.

let's say your FIR is of length L.  let's say that your frame hop is H and 
frame length is F ≥ H and we're doing overlap-add.  then your F samples of 
input (H samples are *new* samples in the current frame, F-H samples are 
remaining from the previous frame) are considered zero-padded out to infinity 
in both directions.  then the length of the result of linear convolution is 
L+F-1.  now if you can guarantee that the size of the DFT, which we'll call "N" 
(and most of the time is a power of 2) is at least as large as the non-zero 
length of the linear convolution, then the result of circular convolution of 
the zero-padded FIR and the zero-padded frame of samples will be exactly the 
same.  that means

   N ≥ L + F - 1

this is always true whether the windowing is rectangular or something else.  
and, whether your FIR varies in definition or not, the length L must never be 
longer than N-F+1.  all frequency responses (which is what you multiply with in 
the frequency domain) must be the N-point DFT of an FIR limited in length to 
N-F+1.

if it is a rectangular window, the frame length and frame hop are the same, 
F=H, and the number of generated output samples that are valid is H, and the 
most you can hope to get is:

    H = F = N - L + 1

now, if you want to window that input data with a complementary window, such as 
the Hann window, that's fine, but instead of having the frame hop equal to the 
frame length, the relationship between the two is

    F = 2H - 1   or   H = (F+1)/2   (50% overlap)

so now, the number of valid output samples is about half as before.

    H = (F+1)/2 = (N-L)/2 + 1

so the input buffer to the FFT will still be zero padded with N+1-F zeros, 
independent of the hop size.  but if you get a bigger hop size and more output 
samples per frame with a rectangular window.  and in both cases you get exactly 
the same results (up to rounding error) in either case.

now, if it is overlap-scrap (or "overlap-save") and a rectangular window (which 
is no window at all, because the data is not zero-padded), the output samples 
are "butt spliced" (no crossfade) if your FIR filter changes in frequency 
response, the new timbre of the filter is applied instantly with the new frame.

but if is is overlap-add then the F samples in the frame are zero-padded with 
N-F zeros and there is a form of crossfading, even with a rectangular window, 
from one frame to the next if the FIR filter definition changes.

if you cut your frame hop size, H, from F to nearly half (F+1)/2 (and use a 
complementary window such as Hann), it is half as efficient, but the crossfade 
is even smoother (and the frame rate is faster, so the filter definition can 
change more often).

all of this is well-established knowledge regarding frame-by-frame processing 
with windows and the FFT.

--
 
r b-j                  r...@audioimagination.com
 
"Imagination is more important than knowledge."
_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Reply via email to