E. Zwicker: psychoacoustics, facts and models.

Let me elaborate just a little on this tonality estimation.

First of all, why do we need tonality estimation? We need it because a
non-tonal sound generates more masking than a tonal one, and thus we need
this estimation to compute the ammount of masking.

Gpsycho: based on the ISO model2 demonstration. It uses predictability. If
amplitude and position of a sound can be accurately preticted from the 2
previous granules data, then the sound is considered tonal. It is a good
idea, but the problem is that it can't detect the tonality of the sound
before the 3rd granule where the sound is present. So the 2 first granules
are wrongs.
It's a little like the ISO short block estimation, were iso model needed
data from previous granule, and then was switching 1 granule too late.
Perhaps this could be fixed by doing tonality estimation of further 2
granules, and when a sound is detected as tonal, mark it as also tonal in
the 2 previous granules. (as obviously it's already tonal since 2 granules)
The second problem is that in the case of a tonal with rapid change in
frequency, like a sine sweep, we miss it everytime.

Nspsytune: based on the same kind of ideas as the ISO model1 demonstration.
(in the case of nspstytune I'm not really sure, I hope that Naoki will
correct me if I'm wrong)
 It uses peak detection. If a freq amplitude is higher by a threshold than
its neighbours, then it's considered as tonal. There is no delay like in
gpsycho, but if several tones are close enough, it will miss them (could it
be the case with Fatboy?).

So the 2 methods are differents, and right now none of them works perfectly.
Perhaps a corrected (like suggested) method one, or a combination of the 2
methods would be accurate enough...

Btw I'd suggest you to have a look at references on the Lame website, I
added references to papers about this tonality estimation.


Regards,

----
Gabriel Bouvigne
www.mp3-tech.org




----- Original Message -----
From: reinhard
To: [EMAIL PROTECTED]
Sent: Monday, January 28, 2002 10:58 AM
Subject: Re: [MP3 ENCODER] MS Stereo


>One of the biggest differences between l3psycho_anal_ns and
>l3psyco_anal is exactly what you are asking about - how the estimate
>the tonality index.  One is a tweaked and cleaned up version of the
>MPEG1/2 recommendation:  the predictiictability of the energy in each
>band over several granules.  I believe it comes from thesis work
>of one of the creators of MP3.  The other is based on how peaked the
>spectrum is, and uses data just from a single granule.  Naoki wrote
>it based on data in Zweicker's book.
                       Zweicker's book??  would you tell me the name of the
book
                    or more information about the l3psycho_anal_ns

>Keep in mind that all the models are very crude estimates,
>and the output should be considered as a rough guide to the noise
>shaping algorthims rather than absolute truth.

_______________________________________________
mp3encoder mailing list
[EMAIL PROTECTED]
http://minnie.tuhs.org/mailman/listinfo/mp3encoder

Reply via email to