Hi
I was poking inside the 700 bit/s code a few days ago and ran across the
mbest quantization code. It struck me as suboptimal, not even a full
beam search as far as I'm able to tell. codebook1 is searched first,
then codebook2, rather than both in parallel.
It turns out that the style of quantization the 700 modes use is called
Additive Quantization in the literature. There are ways to speeding up
such encoders, see [1].
I have attached a sketch of how we could implement this for the 700
modes, which use D=20, M=2, K=512. Because the search ends up just being
one table lookup and two subtractions per combination of n1 and n2, I
think we might be able to go with an exhaustive search. A middle ground
could be a wider beam search, possibly tuneable.
I'll likely give this a proper go once I've pushed the codec2 wrapper
into FFmpeg, this Sunday.
Thoughts?
/Tomas SA2TMS
[1]
https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Babenko_Additive_Quantization_for_2014_CVPR_paper.pdf
/**
* Additive Quantization
* https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Babenko_Additive_Quantization_for_2014_CVPR_paper.pdf
* Makes use of the following identity:
*
* ||q-x||^2 = ||q|| - 2*<q,x> + ||x||^2
*
* Since we're minimizing ||q-x||^2 minimizing the latter half is enough (q is constant in each query):
*
* ||x||^2 - 2*<q,x>
*
* or
*
* 0.5*||x||^2 - <q,x>
*
* where every 0.5*||x||^2 can be precomputed (1 MiB for 32-bit floats).
* We want to find n1 and n2 that minimizes the above.
* Something like this:
*
* // x[n1,n2] = codebook1[n1] + codebook2[n2]
* // <q,x> can therefore be decomposed into <q,codebook1> + <q,codebook2>
* // we therefore only need to compute 512+512 = 1024 dot products,
* // and we only need temporary storage for one set of dot products
* float n2dots[512];
* float n2dots_max = -1e512; // for being able to skip entire rows
* for (int n2 = 0; n2 < 512; n2++) {
* n2dots[n2] = dot(q, &codebook2[n2*20]);
* if (n2dots[n2] > n2dots_max) n2dots_max = n2dots[n2];
* }
*
* // exhaustive search
* // we could use a beam search, but 2^18 table lookups and 2^19 subtractions are cheap enough IMO
* float best = 1e512;
* int best_n1 = 0, best_n2 = 0;
* for (int n1 = 0; n1 < 512; n1++) {
* float n1dot = dot(q, &codebook1[n1*20]);
*
* // only bother if we might actually get a better value
* if (xx_half_min[n1] - n1dot - n2dots_max < d) {
* for (int n2 = 0; n2 < 512; n2++) {
* // make use of euclidian distance identity
* float d = xx_half[n1][n2] - n1dot - n2dots[n2];
* if (d < best) {
* best = d;
* best_n1 = n1;
* best_n2 = n2;
* }
* }
* }
* }
*/
static float xx_half[512][512]; // 0.5*||x||^2 LUT, generate at compile-time (CMake option to disable)
static float xx_half_rowmin[512]; //min of each xx_half[n1][...]
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2