Hi Jean Marc,

Thanks for your comments.  Yes it's a nice step forward from 1300.  I've 
tried VQ of the mag spectrum several times in the past but this time 
it's come together.  Experience and attention to detail I guess.

1) I agree - in fact we use your speex noise suppressor on the FreeDV 
GUI program.

2) It's worth a try but last time we tried joint pitch/energy 
quantisation with prediction it was the most sensitive part of the codec 
to bit errors.  So this would need to be tested carefully, or combined 
with FEC.

3) 40ms is pushing it, there is some reduction in quality, and I'm quite 
surprised it works so well.  Good interpolation helps.  I even played 
with some "analysis by synthesis" interpolation where we try different 
combinations of 3 source 10ms frames over a 80ms period and choose the 
best 3 positions.  Like non-linear sampling.  Didn't get any major 
improvement over simple 1:4 decimation/interpolation (so far).

4) It's direct VQ of the resampled harmonic amplitude samples - no LPC 
or LSP any more.  I agree 120 seconds is way short.  It was just a 
starting point to test the VQ code but then, well it worked, so I forged 
ahead!  Need to come back to this area.  Yes different low pass/high 
pass filtering makes a difference - as it did for LPC/LSP (e.g. cq_ref 
sample breaks 1300 due to it's slope).

However it is very nice working directly in the magnitude spectral 
domain.  For example different filters (+/- 6dB/octave slope) could be 
applied to the source vectors before VQ searching.  The slope wouldn't 
even need to be transmitted, as we don't really care when we listen.

Cheers,

David

On 14/01/17 08:00, Jean-Marc Valin wrote:
> Hi David,
>
> Pretty impressive what you can do with just 700 kb/s and (especially)
> with just 40 ms frames (as opposed to "batches" of 80-200 ms). Just a
> few thoughts I had while reading and listening to the samples:
>
> 1) Noise: it seems like performance gets worse with background noise.
> This is of course perfectly normal for a low bitrate codec (and
> especially a vocoder), but I was thinking you might be able to improve
> coding performance (and intelligibility) by applying noise suppression
> on the input. For uncompressed audio, noise suppression doesn't improve
> intelligibility because our brain can do a better job, but with a
> vocoder, noise suppression may help reducing coding artefacts.
>
> 2) Prediction: You mention not using prediction to make the codec more
> robust to packet loss, but I think some amount of prediction could be
> used safely. The idea is that if you get the wrong gain, the speech will
> still be intelligible during convergence, it's just the level that will
> be slightly wrong. For pitch, if you make the voiced/unvoiced decision
> separate, you can predict across voiced frames, so the deltas will be
> very small and again getting out of sync will just cause the pitch to be
> slightly off for a few frames, again not hurting intelligibility. When
> you think about it, even a constant pitch would still be intelligible!
> I'm pretty sure you can do better than what I had published in
> http://jmspeex.livejournal.com/10446.html and still be robust to packet
> loss. You might be able to get away with just 6 or 7 bits, saving 3-4
> bits that you can then spend on a better spectrum
>
> 3) spectrum: I was wondering how much the 40 ms frames are hurting
> compared to 20 ms, but maybe there's a way to get some of it back by
> coding some temporal information.
>
> 4) LSP VQ: Your results are especially impressive considering the VQ was
> trained on just 120 seconds of speech. Also, I'm not sure what's in
> these 120 seconds, but it's probably a good idea to include as many
> different conditions (frequency responses, noise conditions, speakers,
> ...) as possible. In Speex I made the mistake of having only a single
> condition (clean with modified IRS filter) and that made the LSP
> quantizer not very good for anything else. Even with a single clean
> database, it's easy to add filtering and different types of background
> noise.
>
> Cheers,
>
>       Jean-Marc
>
> On 12/01/17 08:52 PM, David Rowe wrote:
>> Hello Lists,
>>
>> Here is a blog post on the new, and somewhat experimental, Codec 2 700C
>> mode:
>>
>>    http://www.rowetel.com/?p=5373
>>
>> Cheers,
>>
>> David
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Developer Access Program for Intel Xeon Phi Processors
>> Access to Intel Xeon Phi processor-based developer platforms.
>> With one year of Intel Parallel Studio XE.
>> Training and support from Colfax.
>> Order your platform today. http://sdm.link/xeonphi
>> _______________________________________________
>> Freetel-codec2 mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/freetel-codec2
>>
>

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Reply via email to