On Tue, 12 Oct 1999, Mark Taylor wrote:

> The biggest (and hardest :-) unsolved problem: developing a good measure
> of "quality" to be used for VBR encodings.  Determining the number of
> bits to use based solely on the current psycho acoustic model will give
> pretty bad results.  We have some fixes in there which are slowly
> improving things, but the reality is that a average 128kbs VBR stream
> generally does not sound as good as a fixed 128kbs stream, presumably
> becuase of a small percentage of frames which are compressed too much.

After A LOT of work on this topic I've come to the conclusion that we need
to use a different perceptual model for VBR selection. Perhaps the model
we use works well for figuring out masking to descide who needs more bits,
but doesn't work well in the 'you can use any number of bits, please
choose well' VBR case.

The closest I've come to 'not sometimes doing worse then fixed' when not
setting a limit (setting a 160K minimum works well, but kind of defeats
the purpose of VBR) is the following:

I compiled hacked blade to spit out psy model output and final noise (per
subband) into a file for each frame and encoded a bunch of songs in fixed
mode. 

Then I produced a table:

Encode bitrate, total granule energy (quanted to 3db), min noise.

Then I then encoded with a 'quality like' setting, which looked up the
current frame's energy, total energy in the table and aimed for the
computed minimum noise.

This seemed to overcome VBR sounding worse then fixed at the same size.

This was against ~.26, and I trashed it when upgrading. I can redo it
against more current code, but I was waiting for other lame changes to
slow down (or hopefully, for someone to come up with a better method)


Ultimatly I VBR working like this:

Windowd VBR:
./lame -tb 128 -wvb 320,160,128 -window 1sec,5sec,30sec file.wav out.mp3

Where lame would keep the 1sec avg at/below 320Kb/s and 5sec avg at/below 
160Kb/s and the 30sec avg at/below 128Kb/s, while trying to obtain quality
'equivlent' to a fixed 'quality corpus' at 128.

Good for streaming.

Savings VBR:
./lame -tb 192 -brg 0.9 file.wav out.mp3

Here lame would try to keep to an average bitrate of 192, going down or up
based on the ratio of quality to the 192kbit quality corpus, with the
amount of alteration alterd by a damppening value. The 0.9 might express
'If you are a little worse then then quality corpus, it's not worth it to 
go all the way up to 320 to meet it, if 224 gets you 90% of the way there,
then settle for that)..

Something to keep in mind: A lot of people think that VBR is some magical
solution to make MP3 sound real good when you dont care about storage. (I
used to be one of them). It's not, if you are really concerned about
quality and not space, then encode at 320.. VBR can never be better then
320. 

The second example of VBR would be good when space is still quite
important, but you dont mind giving up a lot of encoder performance and a
little space to reduce the risk of an audiable artifact.

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )

Reply via email to