Howdy Robert,

> Alex, if you remember Frank's post about DC offsets, there he attached
> a little C program to calculate AC/DC offsets as well as a correlation
> between left and right channels. (was around 00/08/05)

I'm not sure I read those - DC offsets aren't particularly relevant to my
current efforts (real-time coding).

> I plugged into LAME that correlation test for a single frame.
> If you define RH_VALIDATE_MS at compile time this code gets active.
> Actually the decision whether to use L/R or M/S coding is based
> on masking relations. But sometimes LAME switches to M/S coding
> where L/R coding would be using fewer bits. The extra code you
> can enable with above defined will try to get a rough estimation
> on the correlation of left and right channels and will consider
> the perceptual entropy to check if it would be better not to switch
> to M/S coding.

Ah.  So LAME uses (or can use) multiple criteria for switching.

I'm in the process of adding mid/side to my encoder now, and have just put
in a first pass based on what the ISO spec 'specifies' (OK - suggests?
hints at?) in Appendix G.  Their switching criterion seems particularly
random and strange:  it is based on a comparison of the sum and difference
of the squared energies of the two channels:

        sum(rl[i]^2 - rr[i]^2) < 0.8 * sum(rl[i]^2 + rr[i]^2)

Summing 0<=i<512, where rl and rr are supposedly the energies (so that they
are squaring the energy?!?) of the FFT line spectra.  (I suspect that they
meant to have an absolute value around the difference term.)

I don't think this works very well, and I'm not sure why they thought it
would work (i.e. what its theoretical basis was), so I'm thinking of
substituting a correlator.  Do you know why one would correlate on the
differential instead of the signal?

Also, does anyone know the basis for the ISO switching criterion?  Do they
really mean square energy (quadrupled magnitude)?  They give no hints as to
how to reconcile the mid/side samples with the right/left psychoacoustics in
the loop section of the encoder.  Computing psychoacoustics for the sum and
difference signals makes no sense to me, as one is never going to listen to
them and thus the psychoacoustic threshold figures are irrelevant, but the
alternative of trying to simultaneously allocate bit/noise for both channels
seems overly complicated/possibly impossible (oxymoron strikes!).  I'm
compromising right now by just calculating the distortion thresholds using
the L/R SMR and the M/S signal and bandwidths (and the loop distortions with
the M/S signal), but this seems like a pretty silly way to do things, and it
doesn't sound very good.  I'm trying not to plagiarize LAME (I still use the
ISO/ATT psych model, for one), but I gather that LAME does some sort of
mid/side psychoacoustic processing?

Thanks for any feedback/assistance,
Alex

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )

Reply via email to