Howdy All,
In testing my (comparatively naive) hack of the dist10 encoder, I have
discovered that, while it does OK for music, it has real problems with
speech signals. (Caveat: at our lowest overall bitrate of 300kbps for
combined video/audio, we run the audio at 32kbit mono - though we go way up
to 64kbps mono for higher overall bitrate signals, and are aiming to default
at 64kbps stereo [not joint].) In particular, the broadband noise bursts
associated with fricatives really wreak havoc.
My test signal here is spfe49_1 from the AAC SQAM test suite, which is a
female English speaker going on about giving pills to animals. I ran it
through 1) my encoder, 2) LAME (3.85 w/ frame analyzer), 3) mp3enc31, and 4)
our current Layer-II encoder.
1) With my encoder (64kbps stereo CRC), every fricative is almost painful to
listen to, as the pink noise bursts end up being narrow band filtered (due
to lack of bits - only the MDCT coeffs closest to the pole are making it
into the bitstream), and there are occasional weird high frequency blips and
arpeggiation which are very annoying.
2) LAME (-m s -h -b 64 -p --resample 44.1) (we use CRC and I haven't enabled
LSF yet) sounds pretty good. There are occasional minor glitches, but
that's to be expected at this bitrate. However, LAME (as above plus -k to
turn off the filters) sounds pretty similar to what I'm getting. I note
that without the forced resampling, LAME will attempt to downsample to
22050.
3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good. (Man,
is it slow, though.) Again, without the forced MPEG-1 sampling rate, the
mp3enc31 will attempt to use 22050.
4) Layer-II (64 kbps stereo CRC) sounds good.
So my question(s) are: Is the solution to my problem to filter/downsample
(and use joint, when I get around to coding it up)? That seems to be what
is making the difference in the case of LAME; I assume that FhG is using
some filtering as well, though there's no way to disable it and see for
sure. Are there really just not enough bits for this type of signal at this
bitrate? Why does Layer-II do so much better a job with this type of
signal? Do other codecs (AAC/MPEG-4) hand this kind of signal better as
well? And what is the capital of Assyria?
Inquiring minds wanna know,
Alex
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )