On 20 April 2012 23:50, Chris Cannam <chris.can...@eecs.qmul.ac.uk> wrote:
> Mark Dolson's CARL phase vocoder
> (http://www.crca.ucsd.edu/cmusic/cmusic.html) from 1984 uses window
> presum if the FFT is shorter than the window, and includes
> resynthesis.
>
> I suppose the idea is to reduce some of the annoying artifacts of
> phase vocoder scaling at the expense of time-domain aliasing, which
> may be more tolerable for some signals and listeners. My impression is
> it doesn't work all that well, but it may have made a significant
> difference to computational cost in 1984. The code is worth reading
> though, it's clear and has illuminating comments.

I know of this code (through its Richard Dobson's modification),
that's how I found out about the technique in the first place (though
there it is called "double windowing"). Although I must say the code
itself struck me as quite the opposite of clear and illuminating,
rather like typical "cryptic Matlab code with three letter variables"
converted into "obfuscated C"...The original pvoc.c code from which
both the Csound and RWD sources seem to be derived actually has most
of the functionality in a giant main() function (::shocked::)...

More importantly, yes, the code does have resynthesis and the way it
avoids the flanging (or echo for larger frame sizes) artefacts (that
you get from adding the frame-size-delayed copy of the signal to
itself) is by applying a sinc to the analysis and synthesis windows
but with different 'periods' for analysis and synthesis:
- analysis x = Pi * i / frameSize
- synthesis x = Pi * i / hopSize
in order to satisfy the condition "window[Ni] = 0 for i != 0". None of
the papers or books (I've read so far) discussing the technique
mention this condition or the sinc pass so I'm not sure where was it
pulled from. It does seem crucial however and it makes the whole thing
"just work" although I still don't quite understand how...

As for the "impression", I find that this approach consistently gives
better results with the phase vocoder, in particular it allows better
transient handling because it allows for shorter FFT frames with out
loss of actual spectral resolution (and it seems still used by
Csound).
Compared to zero padding it is not only more efficient it is also
different in the sense that there is no "zero padding tail" to discard
after the FFT. The discarding of the tail is something you want when
you for example do multiplicative spectral modifications and want to
get rid of time domain aliasing but it is not something you want when
you implement something like a time reverser in the spectral domain
(because then your time reversed signal ends up in the tail which you
discard)...


--
"What Huxley teaches is that in the age of advanced technology, spiritual
devastation is more likely to come from an enemy with a smiling face than
from one whose countenance exudes suspicion and hate."
Neil Postman
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Reply via email to