> On March 10, 2020 11:34 AM Spencer Russell <s...@media.mit.edu> wrote:
> 
>  
> Thanks for your expanded notes, RBJ. I haven't found anything that I disagree 
> with or that contradicts what I was saying earlier - I'm not sure if they 
> were intended as expanded context or if there was something you were 
> disagreeing with.

i think we're all on the same page.  i still view the Venn diagram of this 
having a very small intersection between the notions of "Fast Convolution" 
(that has two techniques called "Overlap-Add" and "Overlap-Scrap", a.k.a. 
"Overlap-Save") and STFT analysis/resynthesis processing, of which the Phase 
Vocoder is the most common application that i see.

> 
> On March 8, 2020 7:55 PM Ethan Duni <ethan.d...@gmail.com> wrote:
> > 
> > Fast FIR is a different thing than an FFT filter bank.
> > 
> > You can combine the two approaches but I don’t think that’s what is being 
> > done here?

and that was my question, too.  but i think that we're all on the same page, 
but we might have semantic issues or maybe differences in details.

> 
> The point I'm making here is that overlap-add fast FIR is a special case of 
> STFT-domain multiplication and resynthesis. I'm defining the standard STFT 
> pipeline here as:
> 

i think i am in total agreement, but i want to point out where i see the small 
intersection on the Venn diagram between the STFT thingie and fast-convolution.

> 1. slice your signal into frames

(a) but, it some cases the frames are overlapping.  in the simple 
fast-convolution using overlap-add, there is no overlap and the rectangular 
window width, F, and frame hop, H, are the same number.

> 2. pointwise-multiply an analysis window by each frame

(b) but with fast-convolution using overlap-scrap (or "overlap-save") there is 
no windowing of any kind.  the frame width is F=N, but the frame hop is H<N, 
there are N-H samples retained from the previous frame, and the length of the 
FIR is L=N-H+1 .

(c) for the most efficient fast-convolution using overlap-add, the frame or 
window width F<N, the window is rectangular so we're multiply by 1 (which is 
just a copy operation), and there are N-F zeros padded.

> 3. perform `rfft` on each frame to give the STFT domain representation
> 4. modify the STFT representation

(d) now right here, "modify" can mean a whole shitload of different things, 
depending on the application.  but, for conceptualizaion, if you take whatever 
the result of the modification is and complex divide it by whatever the 
spectrum was before modification, you can, assuming no divides-by-zero, think 
of that modification as a point-by-point multiplication or 
pseudo-multiplication by a "frequency response" or pseudo-frequency-response.  

**if** that frequency response is the DFT of an impulse response or 
pseudo-impulse-response that has non-zero length no longer than N-F+1 (so there 
are F-1 zeros at the end of the impulse response or pseudo-impulse response), 
there will be no time aliasing as a consequence of this modification operation. 
 however, if that modification is equivalent to multiplying by a frequency 
response that corresponds to an impulse response that *is* longer than N-F+1, 
then time-domain aliasing occurs, but we might hope that it's around the side 
tails of the synthesis window and will be attenuated.

> 5. perform `irfft` on each frame
> 6. pointwise-multiply a synthesis window on each frame

(e) now there is none of this synthesis window used for regular-old 
fast-convolution.  but in a phase vocoder there very well may be.  Miller 
Puckette and Jean Laroche mentioned using a Hann on the analysis window and 
another Hann on the synthesis window, they argue that the effect is Hann^2 and 
show with 75% overlap, that it adds to 1 (but i wouldn't call the Hann^2 
"complementary").

(f) but i am suspect of the preservation of the analysis Hann window after the 
modification step #4.  in my 2001 Intraframe Time-Scaling paper ( 
https://www.researchgate.net/publication/3927319_Intraframe_time-scaling_of_nonstationary_sinusoids_within_the_phase_vocoder
 ), my suggestion is to use a Gaussian window that gets very close to zero at 
the cutoff tails.  i show that that modification done in the frequency domain 
preserves the Gaussian window for each frequency component.  then when the iFFT 
is done, we know we have a Gaussian window, **but** that is not complementary.  
so then we multiply (in the time-domain) by a synthesis window that is Hann 
divided by Gaussian to change the window to complementary Hann.  (after that we 
overlap add.)

> 7. overlap-add each frame to get the resulting time-domain signal

(g) except for overlap-scrap fast convolution.  in that case, you output (or 
"save") the correct H samples (which is less than F=N), scrap the N-H samples 
that are contaminated by time aliasing, and move on to the next frame.

these are the "little details" that i am picking bones about.

but Spencer, i think that you and Ethan and i are all on the same page.  not 
picking on Zhiguang Zhang (not sure which is your "given name" and which is 
your surname or family name, and i wouldn't mind if you make it clear, of if 
you want us to use "Eric" instead) but i just want to make clear to him about 
these little details (and to correct a few actual falsehoods such as the DFT 
being TI).  i just want to reiterate what Ethan said that this FFT fast 
convolution (which we use for convolutional reverbs and such) is not really the 
same thing, conceptually, as STFT processing even though both have used the 
terms "frame" and "overlap-add".

"that's my story and i'm sticking to it."

--
 
r b-j                  r...@audioimagination.com
 
"Imagination is more important than knowledge."
_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Reply via email to