> 
> I think it is certain that this problem is caused by noises
> concentrated on pure tones. 
> A noise on a single MDCT coefficient increases as it's
> amplitude increases. This is because quantized values are
> actually values powered by 3/4.
> 
> According to the theory of psychoacoustic, it is a wonder
> why using average noise rather than MAXNOISE succeeds.
> Perhaps this is the answer of this question:
> The reason why vbrtest problem is not apparent in most cases
> is that if a calculated noise of an MDCT coefficient exceeds
> the masking threshold of it's SFB, this means that amplitude
> of that coefficient is large and thus it makes larger masking
> itself. So, real masking of it's frequency is larger than
> calculated masking of it's SFB.
> 
> Perhaps the most clear solution is to calculate masking of 
> all MDCT coefficients, and use MAXNOISE. But this is too
> expensive. Perhaps a reasonable solution is detecting peaks
> and calculating these maskings separately.
> 
> --
> Naoki Shibata   e-mail: [EMAIL PROTECTED]
> 


The reason you cannot use maxnoise is that the output of a lapped
transform (like the MDCT) with very short transform lenghts (576 and
192 in our case), individual frequency information CANNOT BE TRUSTED!
You have to do some kind of smoothing/convolution before this
information is meaninfull.  For example, two identical signals, just
phase shifted and with different low-level noise, can generate very
different maximum values, but to treat these signals differenty would
be a mistake.

This is the real reason why the psycho-acoustic model does not do 
spectral line-by-line calculations, but first convolves down
to 64 "partition" bands.  (They did not use partition bands
just to save CPU cycles!)

This information is then averaged (or in the case of AAC,
they use min & max) down to the 21 scalefactor bands.
But what do you think of this variation on your idea? :

   1. compute psychoacoustics in the 64 partition bands
   2. compute noise in the same 64 partition bands
   3. adjust the scalefactors for scalefactor band N so that
      noise < masking, for all partition bands contained in band N.

I think this is a good compromise between avenoise and maxnoise,
and actually simplifies the psycho acoustic model since
we dont have to map from partition bands down to scalefactor bands.

Mark
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )

Reply via email to