On 12/09/2021 03:33, Greg Maxwell wrote:
https://speechbot.github.io/resynthesis/

https://ai.facebook.com/blog/textless-nlp-generating-expressive-speech-from-raw-audio/

The 365 bps figure is not totally fairly comparable to more
traditional codecs because they presume a per-speaker speaker
embedding is sent once.

Does that data also depend on the language the speaker is using? That is, will new data be required if the speaker changes languages? I've encountered this issue before with codecs requiring up front voice parameters, especially when tonal languages, like Cantonese, are used.

Regards,

Steve



_______________________________________________
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Reply via email to