On Sunday, 29 November 2015 at 16:15:32 UTC, Guillaume Piolat
There is also a sample-wise FFT I've came across, which is
expensive but avoids chunking.
Hm, I don't know what that is :).
Looking for similar grains is the idea behind the popular
auto-correlation pitch detection methods. Require two periods
else no autocorrelation peak though. The rumor says that the
non-realtime Autotune works with that, along with many modern
pitch detection methods.
I thought they used Laroche and Dolson's FFT based one combined
with a peak detector, but maybe that was the real time version.
There are other full spectral resynthesis methods that throw away
phase information and represent each spectral components as a
bandpass filter of noise. That is rather expressive since you can
do morphing with it. (Like you can do with images). But since you
throw away phase information I guess some attacks suffer, so you
have to special case the attacks as "residue" samples that are
left in the time domain (the difference between what you can
represent as spectral components and the left over bits).
I don't know what "voicedness" is? You mean things like
vibrato is the pitch variation that occur when the larynx is
Yes, so that will generate sidebands in the frequency spectrum,
like FM synthesis, right? So in order to pick up fast vibrato I
would assume you would also need to do analysis of the spectrum,
voicedness is the difference between sssssss(unvoiced) and
A phonem is voiced when there is periodic glottal closure and
Ah! In the 90s I read a paper in Computer Music journal where
they did song synthesis by emulating the vocal tract as a
"physical" filter-model. I'm not sure if they used FoF for
generating the sound. I think there was a vinyl flexi disc with
it too. :-) I have it somewhere...
You might find it interesting.
When the sound isn't voiced, there is no period. There isn't a
"pitch" there. So pitch detection tend to come with a
So it is a problem for real time, but in non-real time you can
work your way backwards and fill in the missing parts before
doing resynthesis? I guess?
The devil in that is that voicedness itself is half a lie, or
let say a leaky abstraction, it breaks down for distorted
Right. You have a lot of these problems in sound analysis. Like
sound separation. The brain is so impressive. I still have
problem understanding how we can hear 3D with two ears. Like
distinguishing above and below. I understand the basics of it,
but it is still impressive when you try to figure out _how_.
I guess that's why IRCAM can sell licenses to superVP. :)
Their paper on that topic are interesting, they group spectral
peaks by formants and move them together.
I've read the Laroche and Dowson paper in detail, and more or
less know it by heart now, but maybe you are thinking about some
other paper? Their paper was good on the science part, but they
leave the artistic engineering part open to the reader... ;-)
More insight on the artistic engineering part is most welcome!!