[MP3 ENCODER] Lame quality observations

Raph Levien Fri, 3 Dec 1999 20:37:06 -0800
Greetings lamers,

   First, major kudos for the work that's been done, especially in the
space betwen 3.13 and 3.50. Lame is truly head and shoulders above the
other free encoders, and is starting to give Fraunhofer a run for
their money. It is exciting to think that with continuing refinement
it may surpass the FhG coder.

   I've been doing a fair amount of listening tests while archiving my
CD collection, and, while I'm not a "golden ear" by any standards,
what I've come up with may be of interest.

   One of my standard test tracks now is the beginning of track 6 of
the "Ma Vie En Rose" soundtrack. I've put a clip of this up at:

   http://www.cs.berkeley.edu/~raph/mp3/

   This track has a few interesting features. The most relevant for
lame is the harp glissando at the end of the clip. At 128kbps, if you
listen carefully to the background noise, you hear it modulate in
amplitude.

   To my ears, it sounds like the problem lame has here is consistency
from frame to frame. This is something that FhG excels at, even at low
bit rates. My guess is that they have something in there that
explicitly manages frame-to-frame consistency.

   If you compare coders at low bitrates (~64kbps), you tend to hear
both the background noise variations and a "warbling" effect in tonal
passages. Both sound to me like frame-to-frame variation, but what's
interesting is that the degradation pattern is _not_ the same among
the different codecs. In particular, blade is a lot worse than lame
for the warbling effects (I hear them quite clearly at 128kbps), but
amplitude variation in the background noise is actually better. This
suggests to me that there's an aspect of the lame psychoacoustic model
which is overoptimizing for something else other than background noise
consistency.

   To me, lame's VBR doesn't particularly help with this artifact, by
the way.

   I've also listened to the ftb_samp example. I agree it's an
excellent test, as the degradation at 128kpbs is quite noticeable.
Again, to my ears it sounds like a lot of the problem is
frame-to-frame amplitude variation. The sounds are a lot more complex,
though, so it's a little harder for me to pick out what's going on.
Incidentally, if you want to hear a joke, listen to this track at
128kbps with the FhG 2.72 encoder; it does a terrible job (much worse
than lame 3.50).

   I suspect that what makes this track particularly difficult is
their use of chorusing effects. Intuitively, this should make sense -
chorusing basically takes narrow frequency peaks in the source stream
and adds new ones closeby in the output stream. And this is, in fact,
exactly what the MP3 psychoacoustic theory says you can't hear well
:). A chorusing unit will also introduce subtle periodic variations. A
lot of what I've heard from the not-even-lame coders sounds like
beating between the frame rate and the periodicity of the chorus unit.

   I haven't done any serious work with audio compression, but I have
with image compression, and there are (I think) some interesting
analogies. JPEG, like MP3, is based on breaking the source signal into
blocks (576/192 samples for MP3, 8x8 pixel blocks for JPEG), doing a
DCT, quantizing, and Huffman coding. They are, I think, almost
cousins. There are differences, of course; JPEG doesn't do the subband
thing, and its DCT is 2D rather than 1D. However, even the "stereo" is
in some way analogous: JPEG encodes a three-channel signal (RGB) by
splitting into a "mid" (Y, intensity) and "side" (Cr and Cb, red and
blue chromaticities) signal, and encoding each separately.

   Perhaps not surprisingly, then, even the artifacts are analogous.
It's well known that JPEG performs very poorly on detailed edges near
a smooth (or even white, in the case of most documents) background.
Remind you of pre-echo? Also, at high compression ratios, you can
easily see the seams at the edge of each 8x8 block.

   Smarter JPEG encoders (like the "optimize" mode in the IJG coder,
which is free software and almost certainly the best coder out there)
explicitly do things to reduce the block-to-block seam artifacts.
Perhaps both camps have something to learn from each other.

   Finally, on the subject of the patent-free coder, you guys probably
know that Huffman coding is not the most information-theoretically
efficient. Arithmetic coding probably holds that title, but is
unfortunately patented by IBM. However, zip-style compression is
capable of squeezing out some of the difference. I've noticed that
gzip reduces the size of .mp3's by 2 - 2.5%. It might make sense to
use zip _instead_ of the Huffman coding. You might also look into some
of the things that the png project has done to increase the
effectiveness of zip compression in the lossless domain.

   Again, kudos for the fabulous work. Lame still has a long way to go
at low bitrates, but at 160kbps is definitely good enough for my
archive.

Raph

P.S. A bit of friendly spam: I warmly invite all the developers of lame
to sign up for accounts at http://www.advogato.org/
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
[MP3 ENCODER] Lame quality observations

Reply via email to