Hi Alex,
In my own tests, the ISO formula was completely unworkable, and I
would recomend not wasting any time on it :-) Similar formuls based on
a L/R correlation also did not work very well. LAME did not have good
mid/side switching until we went to the looking at the differences in
L and R channel maskings. (which I think would be very similar to
computing a correlation (of FFT coefficients) in each critical band
and then taking some kind of average)
Except for the tunings, the idea used in LAME is not orginal.
It is based on this paper:
Johnston and Ferreira, Sum-Difference Stereo Transform Coding, Proc. IEEE ICASSP
(1992) p 569-571.
and I found out later this is the same technique used in AAC.
I believe the idea (but this is not clearly explained in any of the
papers) is based on the fact that noise in the Mid or Side channel
will be spread to both L and R channels during decoding.
If the channels have very different maskings in a given
band, this this noise
Mark
>
> I'm in the process of adding mid/side to my encoder now, and have just put
> in a first pass based on what the ISO spec 'specifies' (OK - suggests?
> hints at?) in Appendix G. Their switching criterion seems particularly
> random and strange: it is based on a comparison of the sum and difference
> of the squared energies of the two channels:
>
> sum(rl[i]^2 - rr[i]^2) < 0.8 * sum(rl[i]^2 + rr[i]^2)
>
> Summing 0<=i<512, where rl and rr are supposedly the energies (so that they
> are squaring the energy?!?) of the FFT line spectra. (I suspect that they
> meant to have an absolute value around the difference term.)
>
> I don't think this works very well, and I'm not sure why they thought it
> would work (i.e. what its theoretical basis was), so I'm thinking of
> substituting a correlator. Do you know why one would correlate on the
> differential instead of the signal?
>
> Also, does anyone know the basis for the ISO switching criterion? Do they
> really mean square energy (quadrupled magnitude)? They give no hints as to
> how to reconcile the mid/side samples with the right/left psychoacoustics in
> the loop section of the encoder. Computing psychoacoustics for the sum and
> difference signals makes no sense to me, as one is never going to listen to
> them and thus the psychoacoustic threshold figures are irrelevant, but the
> alternative of trying to simultaneously allocate bit/noise for both channels
> seems overly complicated/possibly impossible (oxymoron strikes!). I'm
> compromising right now by just calculating the distortion thresholds using
> the L/R SMR and the M/S signal and bandwidths (and the loop distortions with
> the M/S signal), but this seems like a pretty silly way to do things, and it
> doesn't sound very good. I'm trying not to plagiarize LAME (I still use the
> ISO/ATT psych model, for one), but I gather that LAME does some sort of
> mid/side psychoacoustic processing?
>
> Thanks for any feedback/assistance,
> Alex
>
> --
> MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
>
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )