On Sunday, 29 November 2015 at 13:21:53 UTC, Ola Fosheim Grøstad
I remember the electro acoustic people here in Oslo (NoTAM)
doing live pitchtracking 20 years ago, I believe they used an
envelope follower of some sort. Just measuring the distance
between the tops? That was to have the "electronic
accompaniment" follow the lead of the vocalist I believe.
The hard thing about live pitch-tracking is getting the minimal
latency keeping reliability. It's not that simple. You also want
"voicedness", which is more challenging than pitch.
But it has different latency characteristics, overlapped FFT
easily goes into the 20/30 ms.
It depends on how low down in frequency you need to go, a
female voice is at 160 hz and up, and a child is at 250hz and
up. In that frequency range one could do better. And at the
cost of complexity you could use two FFTs, one for the lower
range and another for the higher range.
Thought about it but a singer could usually cover a range of 3
octaves, even if very few song mandate it:
A man voice could go as low as say 40hz.
Only if you would need only one period to guess the pitch
(unlikely), that's already 25ms latency guaranteed, and that's
before you introduce FFT overlap! (if you want eg. to track
harmonics, get formants through linear prediction, etc).
I've not tried the multiple FFT, I was worried pitch would lag
oddly when changing FFT size. Perhaps it could work.
Or maybe one can use wavelets, but I don't know much about
wavelet transforms (they don't map to cosine, so imagine it
will be much harder to do well).
I have trouble to imagine the reconstruction so don't use them
(well, I did once, but didn't _get_ it).