Re: [MP3 ENCODER] LAME M/S thresholds

Menno Bakker Wed, 15 Dec 1999 15:03:18 -0800
Hello,

I don't understand all of the theory myself, but I'll show you what the AAC
document says (it is dowbloadabe from the FAAC website). I find it rather
vague.
This is literally what it says:

<START QUOTE>
M/S Stereo
The decison to code left and right coefficients as either left + right (L/R)
or mid/side (M/S) is made on a noiseless coding band by noiseless coding
band basis for all spectral coefficients in the current block. For each
noiseless coding band the following decison process is used:
1. For each noiseless coding band, not only L and R raw thresholds, but also
M=(L+R)/2 and S=(L-�R)/2 raw thresholds are calculated. For the raw M and S
thresholds, rather than using the tonality for the M or S threshold, one
uses the more tonal value from the L or R calculation in each threshold
calculation band, and proceed with the psychoacoustic model for M and S from
the M and S energies and the minimum of the L or R values for C(w) in each
threshold calculation band. The values that are provided to the imaging
control process are identified in the psychoacoustic model information
section as en(b) (the spread normalized energy) and nb(b), the raw
threshold.
2. The raw thresholds for M, S, L and R, and the spread energy for M, S, L
and R, are all brought into an ``imaging control process''. The resulting
adjusted thresholds are inserted as the values for nb(b) into step 11 of the
psychoacoustic model for further processing.
3. The final, protected and adapted to coder�band thresholds for all of
M,S,L and R are directly applied to the appropriate spectrum by quantizing
the actual L, R, M and S spectral values with the appropriate calculated and
quantized threshold.
4. The number of bits actually required to code M/S, and the number of bits
required to code L/R are calculated.
5. The method that uses the least bits is used in each given noiseless
coding band, and the stereo mask is set accordingly.

With these definitions
Mthr,Sthr,Rthr, Lthr              raw thresholds. (the nb(b) from step 10 of
the psychoacoustic model)
Mengy,Sengy,Rengy,Sengy   spread energy.(en(b) from step 6 of the
psychoacoustic model)
Mfthr, Sfthr, Rfthr, Lfthr        final (output) thresholds. (returned as
nb(b) in step 11 of the psychoacoustic model)
bmax(b)                                BMLD protection ratio, as can be
calculated from
bmax(b) = pow(10, -3*(0.5+0.5*cos(Pi*(min(bval(b),15.5)/15.5)))

the imaging control process for each noiseless coding band is as follows:
t=Mthr/Sthr
if (t>1)
    t=1/t
Rfthr= max(Rthr*t, min (Rthr, bmax*Rengy)
Lfthr= max(Lthr*t, min (Lthr, bmax) *Lengy)
t=min(Lthr, Rthr)
Mfthr=min(t, max(Mthr, min(Sengy*bmax,Sthr) )
Sfthr=min(t, max(Sthr, min(Mengy*bmax,Mthr) )
<END QUOTE>

Extra:
C(w) = unpredictability

I changed this last piece of code until I found it to sound the best
(Although, maybe there are more changes possible). I find steps 3,4 and 5
strange since they look only at the amount of bits.
Maybe some of you really understands this and can make something from it
with a theoretical proof. I find the way I do it now in FAAC very good and
makes the encoder sound a lot better.

Bye, Menno

> Hi Menno,
>
> I took a look at psy_step11andahalf() and couldn't understand the
> reasoning behind it.  Maybe you could explain some?
>
> I dont have the AAC ISO docs, so I'm just using the Johnson and
> Ferreira reference.  In that paper, the MLD correction is used to
> compensate for stereo demasking, but only at low frequencies.  The MLD
> seems to be constructed so that at high frequencies, the maskings in
> either channel will use the maximum over both channels, but at low
> frequencies masking in one channel will only effect the other channel
> up the the level of the MLD.
>
> Thus at low frequencies, a signal in the mid channel will have less
> masking on the side channel.  Under this theory, MLD
> must be in increasing function of frequency.
>
> However, looking at this again and comparing with your formulas,
> I think there is a mistake in LAME's implementation:
> The comparison of mid and side maskings should be done with the true
> maskings, not the masking/energy ratios.  I will try to fix this
> today.
>
> Mark
>


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] LAME M/S thresholds

Reply via email to