Hi Mike,

Unfortunately the masking model work didn't lead to a viable Codec 2 mode.

I think a LPCNet codec might suit your application well, I'll have a
release in the next few weeks for you to play with (around 2000 bits/s).

Jean-Marc Valin (the author) has ported the code to C and to run on
general purpose CPUs, and we have some done some optimisation for
ARM-NEON.  It's real time on a modern smart phone, and has scope for
further optimisation (help wanted here!)

You don't need any special libraries, and it doesn't (really) need
training for specific speakers, although you could re-train if you
wanted to.

Cheers,
David

On 23/02/19 19:40, Mike Dawson wrote:
> Hi Codec2 list,
> 
> I'm working together with Samih, looking at shrinking Khan Academy and
> other educational content for our offline library app. I've been trying
> to figure out optimal codec2 encoding / decoding parameters.
> 
> We know who the speaker is in each clip. As far as I can understand, the
> best approach for us to achieve optimal results with a fixed speaker
> set, with having access to the original would be using the masking model
> outlined here: https://www.rowetel.com/?p=4454. Is this masking model
> per speaker, or per clip?
> 
> I haven't managed to get the masking model running yet, but I made a
> basic script (
> https://gist.github.com/mikedawson/1d66a1d35bd1538b2a9950246ef061a2 ) to
> generate comparison tables using a basket of clips and different
> parameter combinations. The audio from 4 Khan Academy clips with
> different codec2 settings is here:
> 
> https://www.ustadmobile.com/files/codec2/out/
> 
> Using VP9 compression, the video in a 3.5 min clip can be shrunk to just
> under 100kB. If we used 2.4kbps codec2 for the audio, we could get the
> audio to around 70kB. As there are around 15,000 videos (only in
> English), codec2 could save a huge amount of space and bandwidth. That
> makes it around 60-70% smaller than the smallest 'mobile friendly' mp4
> version from Khan Academy.
> 
> On the LPCNet topic: this is definitely interesting, but will need
> further investigation. The examples from the masking model sounded
> pretty good. One obstacle I can see is the size of the training file.
> The app has to work offline and we have to keep the app size itself as
> small as possible. Perhaps with a limited speaker set, and no need to
> work on untrained files, this would not be so bad. We would also need to
> get the model to work with Tensorflow lite. Finally, in many places
> where low bandwidth and device space is an issue, the phones themselves
> often have limited capacity (Android 4.4 is still very much alive). 
> 
> Any further suggestion on what would be the current recommended /
> optimal approach for a fixed set of speakers would be much appreciated!
> We're very excited about the potential of this to make this education
> content more accessible.
> 
> Thanks!
> 
> -Mike
> 
> CEO
> Ustad Mobile
> 
> Email: m...@ustadmobile.com
> Web: www.ustadmobile.com
> Twitter: @ustadmobile
> Facebook: www.facebook.com/Ustad.Mobile
> 
> 
> _______________________________________________
> Freetel-codec2 mailing list
> Freetel-codec2@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/freetel-codec2
> 


_______________________________________________
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Reply via email to