https://arxiv.org/abs/2102.09660


Here is the paper.






------------------ ???????? ------------------
??????:                                                                         
                                               "freetel-codec2"                 
                                                                   
<kanto...@gmail.com&gt;;
????????:&nbsp;2021??3??1??(??????) ????3:37
??????:&nbsp;"freetel-codec2"<freetel-codec2@lists.sourceforge.net&gt;;

????:&nbsp;Re: [Freetel-codec2] Codec2 vs Lyra?



Thanks Greg for the comments. I'm not a specialist, far from it. I've just seen 
ML applied in other domains and it did not inspire much confidence, however 
maybe this will be indeed different. If there is anyone who can pull this off, 
it is certainly Google. They do have access to a very large amount of data for 
training the model.

I will read the paper, thanks David, I was browsing mobile and didn't see the 
link.

Adrian

On February 28, 2021 9:35:45 PM UTC, Greg Maxwell <gmaxw...@gmail.com&gt; 
wrote: On Sun, Feb 28, 2021 at 12:41 PM Adrian Musceac <kanto...@gmail.com&gt; 
wrote:
While interesting and newsworthy, I'd assume from the start that this codec has 
the same advantages and pitfalls as other ML applications, i.e. works very well 
in 90% of cases and fails dramatically in 10% of edge cases. Even though the 
page specifies it is aimed at a completely different domain (not radio) I would 
say there is no chance it can replace Codec2 soon, and I think the same for 
LPCNet unfortunately. Sometimes 99.9 % of reliability of a robotic voice is 
better than 90% reliability of high quality voice. Amateur radio seems to me 
like a combination of all kinds of languages, foreign accents and all sorts of 
messy real world input.

Vocoders have plenty of failure modes too, especially in the presence
of background noise... and they can even produce speechlike artifacts.
And there have been fine examples of traditionally designed speech
codecs that failed pretty badly (e.g. ILBC and tonal languages).

It's perhaps a mistake to think of these "low" CPU usage ML speech
coder approaches as just being general ML-- as they impose a lot of
speech specific structure.  In design space they're perhaps someplace
in the middle between something like a vocoder with trained codebooks
and a fully general ML approach (though, sure, closer to general ML).

It's true that they come at a much higher cpu cost,  but if you
compare the energy for computation against the energy required for
transmission their improved bitrate for a given quality might easily
pay for itself from a "how far can I communicate with a given energy
budget" perspective.  Plenty of applications are more spectrally+peak
power limited than energy limited too due to regulations, so in that
model computation is very cheap.  It's also the case that there are
increasingly more small computing devices with tensor acceleration
units that could make the extra computation of these approaches
reasonably energy efficient and also fit into small platforms.

There was a time where the extra complexity of a FM demodulator or
single-sideband in radio was a big deal.

All that doesn't mean it's necessarily a total replacement, but it
might well be the *future* of very low bitrate codecs.

I think it's likely that in the long run these approaches, hybridizing
machine learning with knowledge from traditional DSP, will also
replace the high rate codecs used for music and such, but the
computation required will be orders of magnitude greater, so it's not
obviously realistic. Yet.

Even if the ML approach is fundamentally worse than what could
eventually be designed using traditional techniques there is a
question of development scaling-- if a couple of racks of computers
can in weeks construct a better reconstructor than a team of 200
people could working for five years  and asic acceleration of
matrix-vector operations makes it energy efficient enough to use,
then it's going to be the best tool available because we (the people
of earth, much less the subset interested in open codecs) won't
dedicate 1000 person years to building a better low rate codec but we
can get a couple person years and a few rack-weeks of cpu time.

And sure, maybe some strange pathological inputs it will behave
weirdly on-- but that is already the case for traditional approaches
too.
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2
_______________________________________________
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Reply via email to