> 
> 
>   Hi Mark,
> 
> Mark> Right now, the spreading function
> Mark> is normalized so that (for example) convolving s3 with a constant
> Mark> function will not remove any energy and return the same constant.
> 
>   Is there any reason why energy should be preserved here?
> 
just a convention.  s3[][] is the normalized spreading function.
Adjustments to the strength of the masking should all
be done in one place, later on in the code. 



> Mark> After the spreading function is applied, then you can then adjust the
> Mark> strength based on the tonality measure, and if you want a uniform
> Mark> reduction by .7 (1.5db), it can be incorporated later.  
> Mark> (in fact, later it looks like you increase the masking by 2db, so all of 
> Mark> this could be done at that point?)
> 
>   Why I attenuate 3dB (0.7 != -1.5dB) here is that there is a description
> in Zwicker's book that peak of masking is 3dB below masker. Why I
> increase masking by 2dB later is that I tuned this value by listening
> tests.
> 

I dont know if this is correct, but the definition used in LAME is db
= 10*log10(E) where E is an energy (not amplitude).  10*log10(.7) =
1.5db

Just to keep things consistent with the ISO model, I think ns_psytune
should be implemented as follows:

1. convolve energy with the normalized spreading function.  This gives the
   general shape of the masking.  

2. Now determine the strength of the masking.  ISO model
   reduces the masking by between 6 and 18db (based on tonality).  
   So it will always be at lest 6db below peak, and can never approach
   Zwicker's limit.  

   ns_psytune reduces the masking by 1.5db (in definition of s3[]), 
   then later by 6db (NMT), and then increases it by 2db.

3. Finally, make sure masking has been reduced (from peak) by at least
   'minval'.  This is even more restrictive than Zwicker's 3db.  
    at low frequencies, minval=25db! it decreases down to 0 (no effect)
    at high frequencies.


> 
>   Yes, I know. But, in my experiments, the ISO tonality doesn't
> work as I expect. I want to detect following : In Zwicker's
> book, there is a description that masking made by a pure tone
> is less than that made by noise spreading through a critical band,
> even if total energy of them are same. And, if there are 5 or more
> pure tones with same energy exist within a critical band,
> this can be treated as a noise.
> 

It is good to test both models, but as a point of information: the
specral spreading approach was used in ISO psymodel 1 (for layer 2).
The persistence in time approach is a Fraunhofer invention and is used
in pysmodel 2 for layer 3.  In AAC, ISO has droped psymodel 1
altogether.  Some other intestesting information: someone posted (a
long time ago) a dump of various symbols in mp3enc 3.1, and it looked
like FhG had options for several different tonality models :-)




> 
>   The description in Zwicker's book is basically based on 2 maskers.
> But, there is also a description that someone confirmed that the
> theory is also valid if there are 4 maskers. At least, it's worth
> trying.
> 
agreed.  




> 
> Mark> 4. Your point #5 above is very similar to the ISO formula
> Mark> which is implemented via "minval" threshold.  The strength
> Mark> of the maskings (computed based on tonality) is not allowed
> Mark> to exceed a certain threshold.  The ISO formula is a little
> Mark> more complicated in that this threshold depends on frequency:
> Mark> for low frequencies, minval is more restrictive (resulting
> Mark> in less masking than would be used w/o minval).  
> 
>   I can't understand necessity of minval.
> 
> 
it is a more ellaborate version of Zwieker's 3db limit.  Only
FhG somehow decided that this limit depends on frequency.
They use a similar function in M/S maskings:  the amount
of masking that the L or R channel is allowed to mask the side channel
is limited by a function very like minval.  


> 
>   Why energy is important here? I think more important thing is that
> peak noise must not exceed masking. I cannot explain reason now, but in
> my experiment using FFT, if a pure tone is located at 50%/50% between N
> and N+1 coefficients, drop of peak coefficient's energy is 0.9dB(I have
> to think it's reason, but this may because of window function).
> 

calc_noise computes the energy of the quantization error, for each spectral
coefficient.  This spectrum is increadibly noisy - and it is a bad
idea to trust the information in a single MDCT of FFT coefficient -
you have to do some kind of smoothing before using this informating.
(there are whole books written on how to smooth the spectrum to get
usefull information out of it).  I agree that averaging over an entire
critical band (that's what is done now) may be too much smoothing.

Mark

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )

Reply via email to