On Sun, Feb 28, 2021 at 12:41 PM Adrian Musceac <kanto...@gmail.com> wrote:
> While interesting and newsworthy, I'd assume from the start that this codec 
> has the same advantages and pitfalls as other ML applications, i.e. works 
> very well in 90% of cases and fails dramatically in 10% of edge cases. Even 
> though the page specifies it is aimed at a completely different domain (not 
> radio) I would say there is no chance it can replace Codec2 soon, and I think 
> the same for LPCNet unfortunately. Sometimes 99.9 % of reliability of a 
> robotic voice is better than 90% reliability of high quality voice. Amateur 
> radio seems to me like a combination of all kinds of languages, foreign 
> accents and all sorts of messy real world input.

Vocoders have plenty of failure modes too, especially in the presence
of background noise... and they can even produce speechlike artifacts.
And there have been fine examples of traditionally designed speech
codecs that failed pretty badly (e.g. ILBC and tonal languages).

It's perhaps a mistake to think of these "low" CPU usage ML speech
coder approaches as just being general ML-- as they impose a lot of
speech specific structure.  In design space they're perhaps someplace
in the middle between something like a vocoder with trained codebooks
and a fully general ML approach (though, sure, closer to general ML).

It's true that they come at a much higher cpu cost,  but if you
compare the energy for computation against the energy required for
transmission their improved bitrate for a given quality might easily
pay for itself from a "how far can I communicate with a given energy
budget" perspective.  Plenty of applications are more spectrally+peak
power limited than energy limited too due to regulations, so in that
model computation is very cheap.  It's also the case that there are
increasingly more small computing devices with tensor acceleration
units that could make the extra computation of these approaches
reasonably energy efficient and also fit into small platforms.

There was a time where the extra complexity of a FM demodulator or
single-sideband in radio was a big deal.

All that doesn't mean it's necessarily a total replacement, but it
might well be the *future* of very low bitrate codecs.

I think it's likely that in the long run these approaches, hybridizing
machine learning with knowledge from traditional DSP, will also
replace the high rate codecs used for music and such, but the
computation required will be orders of magnitude greater, so it's not
obviously realistic. Yet.

Even if the ML approach is fundamentally worse than what could
eventually be designed using traditional techniques there is a
question of development scaling-- if a couple of racks of computers
can in weeks construct a better reconstructor than a team of 200
people could working for five years  and asic acceleration of
matrix-vector operations makes it energy efficient enough to use,
then it's going to be the best tool available because we (the people
of earth, much less the subset interested in open codecs) won't
dedicate 1000 person years to building a better low rate codec but we
can get a couple person years and a few rack-weeks of cpu time.

And sure, maybe some strange pathological inputs it will behave
weirdly on-- but that is already the case for traditional approaches
too.


_______________________________________________
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Reply via email to