You could try matching say a lowered jaw with low octaves and a cheeky jaw with high octaves.


This is a software voice, so nailing down vowels should be easier. However you mention matching recordings with the live data. What is being matched? Some kind of pattern I suppose. What form would the pattern take? How long
of a sample should be checked continuously, etc.?

It's a big topic. I understand your concept of how to do it, but I don't
have the technical expertise or foundation to implement the idea yet.


I have a face that uses computeSpectrum in order to sync a mouth with dynamic vocal-only MP3s... it works, but works much like a robot mouth.
jaw animates by certain amounts based on volume.

I am trying to somehow get vowel approximations so that I can fire off
events to update the mouth UI. Does anyone have any kind of algo that can somehow get close enough readings from audio to detect vowels? Anything I can do besides random to adjust the mouth shape will go miles in making my
face look more realistic.

You really just need to collect profiles to match against. Record people saying stuff and match the recordings with the live data. When they match,
you know what the vocal is saying.
