In my opinion, the pitch estimator is the bottleneck of any narrowband vocoder,
no matter the tranditional or the neural-network based.
LPCnet uses the (last-century) RAPT algorithm for the pitch estimation, and if
the pitch value is wrong, the output would be very terrible.
------------------ Original ------------------
From:
"freetel-codec2"
<da...@rowetel.com>;
Date: Tue, Sep 14, 2021 06:25 AM
To: "freetel-codec2"<freetel-codec2@lists.sourceforge.net>;
Subject: Re: [Freetel-codec2] facebook speech codec at 365bps
On Mon, 2021-09-13 at 07:24 +0000, Greg Maxwell wrote:
> On Mon, Sep 13, 2021 at 7:05 AM Random via Freetel-codec2
> <freetel-codec2@lists.sourceforge.net> wrote:
> > Is it speaker-independent ?
>
> It's speaker independent with the additional per-speaker data
> mentioned in my post.
>
That sounds like speaker dependence to me.
I encountered this with the early LPCNet work as well (as used in
FreeDV 2020), the quality dropped off significantly for about 10% of
voices (including mine!). However I haven't tried the latest version
of LPCnet from Jean-Marc, he's been steadily improving his NN model and
codec.
The Lyra paper mentions some specific work in this area, so I'm sure it
will be addressed in time. High quality, speaker independent speech
coding at sub 1000 bit's certainly feels possible.
Another issue to address is robustness to bit errors. In codec 2 I
avoid inter-frame coding (ie coding differences) to keep some tolerance
to the high bit error rates. This costs a few bits/s compared to a
super efficient approach.
I figure tolerance to bit errors might be something we can train for in
NN codecs.
- David
_______________________________________________
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2
_______________________________________________
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2