Hi David,

Pretty impressive what you can do with just 700 kb/s and (especially)
with just 40 ms frames (as opposed to "batches" of 80-200 ms). Just a
few thoughts I had while reading and listening to the samples:

1) Noise: it seems like performance gets worse with background noise.
This is of course perfectly normal for a low bitrate codec (and
especially a vocoder), but I was thinking you might be able to improve
coding performance (and intelligibility) by applying noise suppression
on the input. For uncompressed audio, noise suppression doesn't improve
intelligibility because our brain can do a better job, but with a
vocoder, noise suppression may help reducing coding artefacts.

2) Prediction: You mention not using prediction to make the codec more
robust to packet loss, but I think some amount of prediction could be
used safely. The idea is that if you get the wrong gain, the speech will
still be intelligible during convergence, it's just the level that will
be slightly wrong. For pitch, if you make the voiced/unvoiced decision
separate, you can predict across voiced frames, so the deltas will be
very small and again getting out of sync will just cause the pitch to be
slightly off for a few frames, again not hurting intelligibility. When
you think about it, even a constant pitch would still be intelligible!
I'm pretty sure you can do better than what I had published in
http://jmspeex.livejournal.com/10446.html and still be robust to packet
loss. You might be able to get away with just 6 or 7 bits, saving 3-4
bits that you can then spend on a better spectrum

3) spectrum: I was wondering how much the 40 ms frames are hurting
compared to 20 ms, but maybe there's a way to get some of it back by
coding some temporal information.

4) LSP VQ: Your results are especially impressive considering the VQ was
trained on just 120 seconds of speech. Also, I'm not sure what's in
these 120 seconds, but it's probably a good idea to include as many
different conditions (frequency responses, noise conditions, speakers,
...) as possible. In Speex I made the mistake of having only a single
condition (clean with modified IRS filter) and that made the LSP
quantizer not very good for anything else. Even with a single clean
database, it's easy to add filtering and different types of background
noise.

Cheers,

        Jean-Marc

On 12/01/17 08:52 PM, David Rowe wrote:
> Hello Lists,
> 
> Here is a blog post on the new, and somewhat experimental, Codec 2 700C 
> mode:
> 
>    http://www.rowetel.com/?p=5373
> 
> Cheers,
> 
> David
> 
> 
> 
> ------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Freetel-codec2 mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/freetel-codec2
> 

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Reply via email to