Hi Mark,
Mark> Right now, the spreading function
Mark> is normalized so that (for example) convolving s3 with a constant
Mark> function will not remove any energy and return the same constant.
Is there any reason why energy should be preserved here?
Mark> After the spreading function is applied, then you can then adjust the
Mark> strength based on the tonality measure, and if you want a uniform
Mark> reduction by .7 (1.5db), it can be incorporated later.
Mark> (in fact, later it looks like you increase the masking by 2db, so all of
Mark> this could be done at that point?)
Why I attenuate 3dB (0.7 != -1.5dB) here is that there is a description
in Zwicker's book that peak of masking is 3dB below masker. Why I
increase masking by 2dB later is that I tuned this value by listening
tests.
Mark> it looks like a
Mark> comparison between peak and average energy within a partition band?
Mark> This is based on the theory that noise is usually has a flat
Mark> spectrum, while pure tones would have sharp peaks?
Yes.
Mark> The ISO measure of tonality is based on how stationary a signal is
Mark> in time. Thus the ISO formula is based on measuring the change in
Mark> energy and phase over 3 granules: if they dont change much over these
Mark> 3 granules, the signal is considered very tone-like, and if they
Mark> change a lot, noise-like.
Yes, I know. But, in my experiments, the ISO tonality doesn't
work as I expect. I want to detect following : In Zwicker's
book, there is a description that masking made by a pure tone
is less than that made by noise spreading through a critical band,
even if total energy of them are same. And, if there are 5 or more
pure tones with same energy exist within a critical band,
this can be treated as a noise.
Mark> 3. Simultaneous masking: This is based on the theory that
Mark> two maskers, when added together, can (but not always?) give
Mark> more masking then if the sum of their individual maskings.
Yes.
Mark> I haven't looked at Zwicker's book (Lincoln's reference for this)
Mark> but I imagine it is based on tests with just 1 or 2 maskers.
The description in Zwicker's book is basically based on 2 maskers.
But, there is also a description that someone confirmed that the
theory is also valid if there are 4 maskers. At least, it's worth
trying.
Mark> 4. Your point #5 above is very similar to the ISO formula
Mark> which is implemented via "minval" threshold. The strength
Mark> of the maskings (computed based on tonality) is not allowed
Mark> to exceed a certain threshold. The ISO formula is a little
Mark> more complicated in that this threshold depends on frequency:
Mark> for low frequencies, minval is more restrictive (resulting
Mark> in less masking than would be used w/o minval).
I can't understand necessity of minval.
Mark> The real goal is to find something that will increase
Mark> the bitrate of vbrtest.wav, without disturbing the bitrate
Mark> of other samples which sound ok with VBR.
Yes, that is what I want to do. And, I think that this is somewhat
achieved. I agree that there is room for improvements.
I think that the original -V 4 is not perfect in many
inputs. I can hear subtle noise which doesn't disappear
with -V2 (e.g. vioo10_2.wav in SQAM). With -V 4 --nspsytune,
these noise completely disappers. Increasing NTHRE produces
same kind of noise with --nspsytune, so I think this is same
problem as vbrtest. So, I think increase of bitrate is not so bad.
I want to hear other people's opinion about quality.
BTW, right now I had some listening tests that disables MAXNOISE
(changing NTHRE to 100000). This produces files whose bitrate a little
less than without --nspsytune in average, and I cannot hear quality
degradation compared to that without --nspsytune. So, I don't think I am
doing completely incorrect things.
Mark> I've played with MAXNOISE and do not really like it since it is based
Mark> on inaccurate energy estimates of single MDCT coefficients, rather
Mark> than some kind of averaging. For example, take a signal with a very
Mark> large N'th coefficient. A tiny change in this signal can easily move
Mark> the energy so that it is now 50%/50% between N and N+1 coefficients.
Mark> The thing that doesn't change is the total energy. Thus I think some
Mark> type of smoothing needs to be done. A better solution might be to
Mark> take the maximum of a moving average of 5 coefficients over all the
Mark> coefficients in the band.
Why energy is important here? I think more important thing is that
peak noise must not exceed masking. I cannot explain reason now, but in
my experiment using FFT, if a pure tone is located at 50%/50% between N
and N+1 coefficients, drop of peak coefficient's energy is 0.9dB(I have
to think it's reason, but this may because of window function).
--
Naoki Shibata e-mail: [EMAIL PROTECTED]
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )