Hi

Following Bruce's email I took a look at
https://github.com/drowe67/codec2/blob/main/doc/codec2.pdf to see how
the speech synthesis in codec2 actually works. This because I suspect
that NN synth (LPCnet?) is doing something similar to what CELT (one
half of Opus) does, namely filling off-peak parts of the spectrum with
noise.

Filling the spectrum with noise makes the output sound a lot less
robot-y, and robot-y sound is known in the CELT world to be due to
collapsing all energy into a single spectral bin per Bark band which,
looking at equation (10), is *precisely what the codec2 synth does*!
Or, sort of. For low F0 there may be multiple peaks in some bands.

One way to make the output less robot-y could be to convolve Ŝw with a
suitable function that smooths out the peaks. In CELT this is done via
clever use of rotations if I remember correctly. A floor of comfort
noise might also be a good idea.

Oh and on the topic of errors in the paper, there's a spelling mistake
on page 2: anlaysed.

/Tomas


_______________________________________________
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Reply via email to