Hi David, Pretty impressive what you can do with just 700 kb/s and (especially) with just 40 ms frames (as opposed to "batches" of 80-200 ms). Just a few thoughts I had while reading and listening to the samples:
1) Noise: it seems like performance gets worse with background noise. This is of course perfectly normal for a low bitrate codec (and especially a vocoder), but I was thinking you might be able to improve coding performance (and intelligibility) by applying noise suppression on the input. For uncompressed audio, noise suppression doesn't improve intelligibility because our brain can do a better job, but with a vocoder, noise suppression may help reducing coding artefacts. 2) Prediction: You mention not using prediction to make the codec more robust to packet loss, but I think some amount of prediction could be used safely. The idea is that if you get the wrong gain, the speech will still be intelligible during convergence, it's just the level that will be slightly wrong. For pitch, if you make the voiced/unvoiced decision separate, you can predict across voiced frames, so the deltas will be very small and again getting out of sync will just cause the pitch to be slightly off for a few frames, again not hurting intelligibility. When you think about it, even a constant pitch would still be intelligible! I'm pretty sure you can do better than what I had published in http://jmspeex.livejournal.com/10446.html and still be robust to packet loss. You might be able to get away with just 6 or 7 bits, saving 3-4 bits that you can then spend on a better spectrum 3) spectrum: I was wondering how much the 40 ms frames are hurting compared to 20 ms, but maybe there's a way to get some of it back by coding some temporal information. 4) LSP VQ: Your results are especially impressive considering the VQ was trained on just 120 seconds of speech. Also, I'm not sure what's in these 120 seconds, but it's probably a good idea to include as many different conditions (frequency responses, noise conditions, speakers, ...) as possible. In Speex I made the mistake of having only a single condition (clean with modified IRS filter) and that made the LSP quantizer not very good for anything else. Even with a single clean database, it's easy to add filtering and different types of background noise. Cheers, Jean-Marc On 12/01/17 08:52 PM, David Rowe wrote: > Hello Lists, > > Here is a blog post on the new, and somewhat experimental, Codec 2 700C > mode: > > http://www.rowetel.com/?p=5373 > > Cheers, > > David > > > > ------------------------------------------------------------------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > Freetel-codec2 mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/freetel-codec2 > ------------------------------------------------------------------------------ Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi _______________________________________________ Freetel-codec2 mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/freetel-codec2
