https://speechbot.github.io/resynthesis/

https://ai.facebook.com/blog/textless-nlp-generating-expressive-speech-from-raw-audio/

The 365 bps figure is not totally fairly comparable to more
traditional codecs because they presume a per-speaker speaker
embedding is sent once.

This model need not be a barrier for amateur radio use. You could
easily imagine a scheme that sent the callsign in each frame and a 1
bit per frame output from a fountain code over the user's speaker
embedding.  A receiver that hasn't recovered a particular speaker's
embedding yet could just use a dummy one until its recovered.   Bonus,
with this approach you get a good voice changer as a side effect--
increasing the appeal of ham radio to CB users! :P


_______________________________________________
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Reply via email to