On Sunday, 29 November 2015 at 15:34:34 UTC, Ola Fosheim Grøstad wrote:

I don't now much about current pitch trackers, but I think you can do a high quality one for voice using filterbanks. Some people do resynthesis that way (and well, that is just an alternative to FFT after all).

You are precisely right, if you don't need reconstruction nothing forces you to use the FFT! There is also a sample-wise FFT I've came across, which is expensive but avoids chunking.


I assume you can make a better pitch tracker that is specialized for voice by thinking about FoF synthesis, the sound of the voice is really a sequence of bursts of roughly the same shape (like granular synthesis in a way) and you should be able to figure out some statistical relationship between formants and how they change with pitch.

Looking for similar grains is the idea behind the popular auto-correlation pitch detection methods. Require two periods else no autocorrelation peak though. The rumor says that the non-realtime Autotune works with that, along with many modern pitch detection methods.


I'm not saying it is easy. Probably a lot published on this though.

I don't know what "voicedness" is? You mean things like vibrato?

vibrato is the pitch variation that occur when the larynx is well relaxed.

voicedness is the difference between sssssss(unvoiced) and zzzzzz (voiced). A phonem is voiced when there is periodic glottal closure and openings.

When the sound isn't voiced, there is no period. There isn't a "pitch" there. So pitch detection tend to come with a confidence measure.

The devil in that is that voicedness itself is half a lie, or let say a leaky abstraction, it breaks down for distorted vocals.

I guess that's why IRCAM can sell licenses to superVP. :)

Their paper on that topic are interesting, they group spectral peaks by formants and move them together.




Reply via email to