[Freetel-codec2] Improving synthesized speech by borrowing ideas from CELT?

Tomas Härdin Fri, 19 Apr 2024 04:31:46 -0700

Hi

Following Bruce's email I took a look at
https://github.com/drowe67/codec2/blob/main/doc/codec2.pdf to see how
the speech synthesis in codec2 actually works. This because I suspect
that NN synth (LPCnet?) is doing something similar to what CELT (one
half of Opus) does, namely filling off-peak parts of the spectrum with
noise.


Filling the spectrum with noise makes the output sound a lot less
robot-y, and robot-y sound is known in the CELT world to be due to
collapsing all energy into a single spectral bin per Bark band which,
looking at equation (10), is *precisely what the codec2 synth does*!
Or, sort of. For low F0 there may be multiple peaks in some bands.

One way to make the output less robot-y could be to convolve Ŝw with a
suitable function that smooths out the peaks. In CELT this is done via
clever use of rotations if I remember correctly. A floor of comfort
noise might also be a good idea.

Oh and on the topic of errors in the paper, there's a spelling mistake
on page 2: anlaysed.

/Tomas


_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

[Freetel-codec2] Improving synthesized speech by borrowing ideas from CELT?

Reply via email to