On 20 April 2012 23:50, Chris Cannam <chris.can...@eecs.qmul.ac.uk> wrote: > Mark Dolson's CARL phase vocoder > (http://www.crca.ucsd.edu/cmusic/cmusic.html) from 1984 uses window > presum if the FFT is shorter than the window, and includes > resynthesis. > > I suppose the idea is to reduce some of the annoying artifacts of > phase vocoder scaling at the expense of time-domain aliasing, which > may be more tolerable for some signals and listeners. My impression is > it doesn't work all that well, but it may have made a significant > difference to computational cost in 1984. The code is worth reading > though, it's clear and has illuminating comments.
I know of this code (through its Richard Dobson's modification), that's how I found out about the technique in the first place (though there it is called "double windowing"). Although I must say the code itself struck me as quite the opposite of clear and illuminating, rather like typical "cryptic Matlab code with three letter variables" converted into "obfuscated C"...The original pvoc.c code from which both the Csound and RWD sources seem to be derived actually has most of the functionality in a giant main() function (::shocked::)... More importantly, yes, the code does have resynthesis and the way it avoids the flanging (or echo for larger frame sizes) artefacts (that you get from adding the frame-size-delayed copy of the signal to itself) is by applying a sinc to the analysis and synthesis windows but with different 'periods' for analysis and synthesis: - analysis x = Pi * i / frameSize - synthesis x = Pi * i / hopSize in order to satisfy the condition "window[Ni] = 0 for i != 0". None of the papers or books (I've read so far) discussing the technique mention this condition or the sinc pass so I'm not sure where was it pulled from. It does seem crucial however and it makes the whole thing "just work" although I still don't quite understand how... As for the "impression", I find that this approach consistently gives better results with the phase vocoder, in particular it allows better transient handling because it allows for shorter FFT frames with out loss of actual spectral resolution (and it seems still used by Csound). Compared to zero padding it is not only more efficient it is also different in the sense that there is no "zero padding tail" to discard after the FFT. The discarding of the tail is something you want when you for example do multiplicative spectral modifications and want to get rid of time domain aliasing but it is not something you want when you implement something like a time reverser in the spectral domain (because then your time reversed signal ends up in the tail which you discard)... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp