On Sunday, 29 November 2015 at 13:21:53 UTC, Ola Fosheim Grøstad wrote:

I remember the electro acoustic people here in Oslo (NoTAM) doing live pitchtracking 20 years ago, I believe they used an envelope follower of some sort. Just measuring the distance between the tops? That was to have the "electronic accompaniment" follow the lead of the vocalist I believe.



The hard thing about live pitch-tracking is getting the minimal latency keeping reliability. It's not that simple. You also want "voicedness", which is more challenging than pitch.

But it has different latency characteristics, overlapped FFT easily goes into the 20/30 ms.

It depends on how low down in frequency you need to go, a female voice is at 160 hz and up, and a child is at 250hz and up. In that frequency range one could do better. And at the cost of complexity you could use two FFTs, one for the lower range and another for the higher range.

Thought about it but a singer could usually cover a range of 3 octaves, even if very few song mandate it: https://www.youtube.com/watch?v=cveoHrMyUDs&t=41s.

A man voice could go as low as say 40hz.
Only if you would need only one period to guess the pitch (unlikely), that's already 25ms latency guaranteed, and that's before you introduce FFT overlap! (if you want eg. to track harmonics, get formants through linear prediction, etc).

I've not tried the multiple FFT, I was worried pitch would lag oddly when changing FFT size. Perhaps it could work.


Or maybe one can use wavelets, but I don't know much about wavelet transforms (they don't map to cosine, so imagine it will be much harder to do well).

I have trouble to imagine the reconstruction so don't use them (well, I did once, but didn't _get_ it).

Reply via email to