On Sunday, 29 November 2015 at 15:34:34 UTC, Ola Fosheim Grøstad
I don't now much about current pitch trackers, but I think you
can do a high quality one for voice using filterbanks. Some
people do resynthesis that way (and well, that is just an
alternative to FFT after all).
You are precisely right, if you don't need reconstruction nothing
forces you to use the FFT!
There is also a sample-wise FFT I've came across, which is
expensive but avoids chunking.
I assume you can make a better pitch tracker that is
specialized for voice by thinking about FoF synthesis, the
sound of the voice is really a sequence of bursts of roughly
the same shape (like granular synthesis in a way) and you
should be able to figure out some statistical relationship
between formants and how they change with pitch.
Looking for similar grains is the idea behind the popular
auto-correlation pitch detection methods. Require two periods
else no autocorrelation peak though. The rumor says that the
non-realtime Autotune works with that, along with many modern
pitch detection methods.
I'm not saying it is easy. Probably a lot published on this
I don't know what "voicedness" is? You mean things like vibrato?
vibrato is the pitch variation that occur when the larynx is well
voicedness is the difference between sssssss(unvoiced) and zzzzzz
A phonem is voiced when there is periodic glottal closure and
When the sound isn't voiced, there is no period. There isn't a
"pitch" there. So pitch detection tend to come with a confidence
The devil in that is that voicedness itself is half a lie, or let
say a leaky abstraction, it breaks down for distorted vocals.
I guess that's why IRCAM can sell licenses to superVP. :)
Their paper on that topic are interesting, they group spectral
peaks by formants and move them together.