>
> You can easyly find out
>
> 1.414216 / 4.315976e-05 = 32767.003338294744919
>
> And, xr is amplitude and is proportional to the input file amplitude.
> not squared(energy).
>
I dont think it was ever in doubt that xr is an amplitude? xr(i)
is the i'th the MDCT coefficient in the expansion of the original wave
form in terms of cosine functions with wave number i.
> I think we should use energy difference in db, so the noise formula
> should be
> sum (10 log (xr^2 - ix^(8/3)))
> or
> 10 log (sum(xr^2 - ix^(8/3)))
>
> and I think the latest quantization method will do the best aproximation
> even when we use this noise formula.
Here is where the 'normal' formula comes from. It is used in
MPEG (all layers, and AAC) as well as any book error analysis:
f(t) = original wave form
g(t) = compressed and uncompressed wave form
e(t) = error
e(t) = f(t) - g(t)
And denote the MDCT or fourier transform by e'(k), where
k = wave number. e'(k) = FFT(e(t))
The power spectrum of the energy is given by:
(e'(k))^2 = (f'(k)-g'(k))^2
Which is the usual formula. It is measuring the power spectrum of
the error signal, as opposed to the suggestion above, which would be:
f'^2 - g'^2
The real question is which definition is consistent with the
masking computed by the psy model. When experimentalists say
that under certain conditions, there will be (for example)
5 db of masking, they mean the following:
f(t) + e(t) sounds the same as f(t) for any signal e(t) which is at most
5db.
The masking is measured as the energy of the perturbation e(t).
Take a look at any of the descriptions of masking on the web.
They will give plots which show the energy at which a test
tone e(t) becomes audible in the presence of a masker f(t).
They are always giving maskings in terms of the energy of e(t)
as a function of frequency.
So now let g(t) be our encoded signal. Is g(t) going to sound the
same as f(t)? Just define: e(t) = f(t) - g(t) and then
compute the energy in e(t). If it is less than 5db, it should
be inaudible. This energy is given exactly by:
(f'(k) - g'(k) )^2
Which is how we compute the noise in quantize.c. The whole point of
quantize.c is to choose scalefactors which get this noise under
the allowed masking. Thus I think it makes most sense if the
integer quantization is also optimized to reduce this defination
of noise.
Mark
where e(t) is a signal of at most 5db. So
Furthermore, the masking computed by the psy model is given in
terms of how much noise is masked.
H
The differences between the commonly accepted approach
It really depends on what you think is most important. If I was going
to
In lack of a good lit
I think it really depends on what
What you are preposing
No one has yet to explain why looking at the differences in energies
is better than looking at the energy of the difference.
energy difference is
better than the energy of the difference.
This goes agains all the ISO docs (layer I, layer II, layer III and
AAC and normal DSP conventions. Thus it is not going into LAME unless
on signal analysis, and signal analysis of noise
I also think we need to minimize the energy difference in db:
the difference = xr - ix^(8/3)
the energy of the difference in db.
> ---
> Takehiro TOMINAGA // may the source be with you!
> --
> MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
>
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )