Hi Tomas and Albert,
700C works re-sampling the variable rate L harmonics (where L varies
frame to frame based on the pitch) to a fixed K=20 vector with
mel-spaced samples. Then we VQ the K sample vector.
This works OK, but does introduce some distortion, which I traced to
under-sampling of high frequency spectral peaks.
Turns out that although the ear is sensitive to log(f), we sometimes
generate sound energy with a narrow bandwidth. That needs to be
captured for higher quality speech.
Cheers,
David
On 28/06/17 13:27, Albert Cahalan wrote:
On 6/26/17, David Rowe <da...@rowetel.com> wrote:
I'm currently using a 2D DCT approach (bit like JPG) to code a version
of the spectrum. VQ could also be used.
We don't really need to reg-generate the upper 4kHz as it doesn't take
many bits to encode it faithfully (perhaps 20% more than the first
4kHz). The log(f) response of the ear means there isn't much info there
we can actually perceive.
You might try a non-linear scaling prior to the 2D DCT.
Squish and stretch it as you would if making a log-normal graph.
That is, so that your spectrum plot shows octaves with equal spacing,
such that you could line it up with a musical keyboard.
A transform of that nature lets you handle the higher frequencies
with an appropriate level of detail.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2