https://speechbot.github.io/resynthesis/
https://ai.facebook.com/blog/textless-nlp-generating-expressive-speech-from-raw-audio/ The 365 bps figure is not totally fairly comparable to more traditional codecs because they presume a per-speaker speaker embedding is sent once. This model need not be a barrier for amateur radio use. You could easily imagine a scheme that sent the callsign in each frame and a 1 bit per frame output from a fountain code over the user's speaker embedding. A receiver that hasn't recovered a particular speaker's embedding yet could just use a dummy one until its recovered. Bonus, with this approach you get a good voice changer as a side effect-- increasing the appeal of ham radio to CB users! :P _______________________________________________ Freetel-codec2 mailing list Freetel-codec2@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freetel-codec2