Hello Jan,
On 14-Mai-99, Jan Peman wrote:
> The short_block switching criterion used in LAME is from the ISO code/docs:
>
> pe (perceptial entropy) > 1800.
>
> I'm pretty sure FhG does not use this formula, since they sometimes
> switch to short_blocks at frames where the pe is quite low. so the
> questions are:
>
> 1. is there a more accurate pe formula than the one in the ISO docs?
> 2. is there a better short_block switching criterion?
>
>
> Some MPEG1 and MPEG2 papers claim the pre-echo detection is from
> B. Edler, "Coding of audio signals with overlapping block
> transform and adaptive window functions" (in German) Frequenz,
> vol 43 1989 pp 252-256.
>
> Maybe someone in Germany could check this out?
> If this paper just gives the pe>1800 formula, it's not that usefull,
> but maybe it contains something better?
Hi, I'm from Germany and I've checked the paper out.
In section 4 ("Control of the window adaption"), it reads:
(please note that this is my translation from German)
"Considering the proportion of the signal energies in two consecutive blocks,
one notices an increase of this proportion, if the amplitude of the input
signal increases rapidly. If the amplitude of the input signal remains
approximately constant, the proportion has values of about 1. For simplicity,
the energies may be replaced by the amount sums (||) of the sample values
inside a block. The recognizability of rapid amplitude jumps can even be
increased if not the amounts of the samples, but the amounts of the
output values of a simple high pass filter are being added. For high pass
filtering, the differences between two consecutive samples are measured:
(use TeX...)
$$ c_1(v) = {\sum_{n=0}^{N-1}{|d(vN + n)|} \over \sum_{n=0}^{N-1}{|d(vN - N +
n)|}} $$
$$ with $$
$$ d(n) = x(n) - x(n - 1) $$
Very short impulses and amplitude jumps at the end of a block contribute only
a bit to the overall energy of a block and are recognized only hard by formula
c1(v). Therefore, a second criterion is introduced, which takes the maximum
amount inside of a block and relates it to the average of the amounts:
(use TeX...)
$$ c_2(v) = {\max_{n=0}^{N-1}{\{|x(vN + n)|\}} \cdot N \over
\sum_{n=0}^{N-1}{|x(vN + n)|}} $$
These two criterions supplement each other and can control the windowing in
order to switch to short window lengths in case of one of the two values
exceeds a certain threshold."
Unfortunately, no such threshold values are given, so it will be up to
experimentation. However, a figure in the paper suggests that c1(v) has a
threshold of about 100 (this might be a good starting point). For c2(v),
no figure is given, so you have to guess.
Another noticable fact is that in the paper, they use a transformation of
length N=512 with a sine window of length 2N=1024. In case an impulse is
detected, four short windows of length N=128 and a sine-square window with an
overlap of L=32 samples are used.
Regards,
andre
--
I like work ... I can sit and watch it for hours.
+--------------------+----------------------------------------------------+
| Andre' Osterhues | e-mail: [EMAIL PROTECTED] |
| Meitnerweg 13 | www: http://studserver.uni-dortmund.de/~su0583/ |
| D-44227 Dortmund | "there isn't any reason, there isn't any sense |
| Tel.: 0231/7519501 | nothing else is happening, this is where you are" |
+--------------------+----------------------------------------------------+
--
MP3 ENCODER mailing list