Hi David,
On 13/01/17 09:39 PM, David Rowe wrote:
> Thanks for your comments. Yes it's a nice step forward from 1300. I've
> tried VQ of the mag spectrum several times in the past but this time
> it's come together. Experience and attention to detail I guess.
What do you mean by "mag spectrum"? The actual interpolated spectral
envelope or the discrete sinusoids (in which case what do you do about
variable pitch)?
> 2) It's worth a try but last time we tried joint pitch/energy
> quantisation with prediction it was the most sensitive part of the codec
> to bit errors. So this would need to be tested carefully, or combined
> with FEC.
Maybe instead of going straight to prediction+VQ like I did, you can try
predicting one parameter at a time. Like many things, it's all about the
details. For example, for the gain, the level of the "zero" is important
because because that's the point towards which prediction will tend
converge in case of error. So usually want it somewhere in the middle,
but maybe it's better if it's slightly closer to the gain of active
speech or silence. Forcing some sort of gain normalization (AGC) can
probably help too, regardless of whether you use prediction. For pitch
prediction, you probably want to avoid pitch doubling so that most
frames have the same pitch and any error doesn't cause much problem.
> 3) 40ms is pushing it, there is some reduction in quality, and I'm quite
> surprised it works so well. Good interpolation helps. I even played
> with some "analysis by synthesis" interpolation where we try different
> combinations of 3 source 10ms frames over a 80ms period and choose the
> best 3 positions. Like non-linear sampling. Didn't get any major
> improvement over simple 1:4 decimation/interpolation (so far).
One thing I've always thought might be a good approach for very long
frames would be to compute a 2D DCT of the log spectrum. In 1D the DCT
is just the cepstrum, but in 2D you can capture the temporal shape at
the same time. That way you can remove both the spectral and temporal
redundancy.
> However it is very nice working directly in the magnitude spectral
> domain. For example different filters (+/- 6dB/octave slope) could be
> applied to the source vectors before VQ searching. The slope wouldn't
> even need to be transmitted, as we don't really care when we listen.
By slope, I assume you mean a constant slope over time? I guess it's one
step further in the normalization I was suggesting for gain and indeed,
you'd probably save a few bits by equalizing the input. ...or you could
use prediction of the spectrum over time, which would give you
normalization essentially for free.
Cheers,
Jean-Marc
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2