Hi Steve,

What you are trying to do is interesting, even if it's a bit extreme;
but I do agree that you may well be on to another application area.


I mentioned phonetics yesterday, but perhaps I should have said diphones
instead -- which is the movement from one sound to another.  I once read
that they produce the most natural syntesised voice, but that may not be
the most recent information.  Still, it makes a lot of sense as a path for
voice generation; and quite possibly also for voice recognition.

I don't know if the latter has been researched.  I suppose that there
would have to be a sort of lexeme-ish state machine that is in a
non-deterministic state while a diphone forms and it cannot decide which
way it will go, but that at some point it can decide (or gain certainty)
about which phoneme the voice is transitioning to.

The transitions between phonemes, so the set of possible diphones, is
likely to be limited by physical constraints -- we can't twist our tongues
into any knot that theory can recognise.  This would give rise to a highly
compacted representation, even beyond the mere phonemes or their Huffman
compressed forms.


Another angle could be an attempt to reproduce the shape of the mouth and
the position of the tongle, nasality and so on, deriving that info from
each of the Codec2 packets.  This would mean that you are not limited to
an alphabet of phonemes, nor would you need to see context around any
single frame; that is advantageous when packets are lost.  Modelling only
the voice tract shape would loose the specific sound of the human speaking,
but that is a sacrafice that you are making to get more compression.


Quite a lot of new work, I fear.  Probably PhD-thesis size, or otherwise
a project into which more people need to help out?  It seems very useful,
and I've been thinking along similar lines.  It's the kind of thing to
publish about early & often if you want to get others involved, and to
avoid the technological progress-freeze and adoption-restraint that is
usually the result of patents in this field -- just look at G.722, which
is finally being adopted now that its patents have expired.


Cheers, 
 -Rick

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Reply via email to