E. Zwicker: psychoacoustics, facts and models.
Let me elaborate just a little on this tonality estimation. First of all, why do we need tonality estimation? We need it because a non-tonal sound generates more masking than a tonal one, and thus we need this estimation to compute the ammount of masking. Gpsycho: based on the ISO model2 demonstration. It uses predictability. If amplitude and position of a sound can be accurately preticted from the 2 previous granules data, then the sound is considered tonal. It is a good idea, but the problem is that it can't detect the tonality of the sound before the 3rd granule where the sound is present. So the 2 first granules are wrongs. It's a little like the ISO short block estimation, were iso model needed data from previous granule, and then was switching 1 granule too late. Perhaps this could be fixed by doing tonality estimation of further 2 granules, and when a sound is detected as tonal, mark it as also tonal in the 2 previous granules. (as obviously it's already tonal since 2 granules) The second problem is that in the case of a tonal with rapid change in frequency, like a sine sweep, we miss it everytime. Nspsytune: based on the same kind of ideas as the ISO model1 demonstration. (in the case of nspstytune I'm not really sure, I hope that Naoki will correct me if I'm wrong) It uses peak detection. If a freq amplitude is higher by a threshold than its neighbours, then it's considered as tonal. There is no delay like in gpsycho, but if several tones are close enough, it will miss them (could it be the case with Fatboy?). So the 2 methods are differents, and right now none of them works perfectly. Perhaps a corrected (like suggested) method one, or a combination of the 2 methods would be accurate enough... Btw I'd suggest you to have a look at references on the Lame website, I added references to papers about this tonality estimation. Regards, ---- Gabriel Bouvigne www.mp3-tech.org ----- Original Message ----- From: reinhard To: [EMAIL PROTECTED] Sent: Monday, January 28, 2002 10:58 AM Subject: Re: [MP3 ENCODER] MS Stereo >One of the biggest differences between l3psycho_anal_ns and >l3psyco_anal is exactly what you are asking about - how the estimate >the tonality index. One is a tweaked and cleaned up version of the >MPEG1/2 recommendation: the predictiictability of the energy in each >band over several granules. I believe it comes from thesis work >of one of the creators of MP3. The other is based on how peaked the >spectrum is, and uses data just from a single granule. Naoki wrote >it based on data in Zweicker's book. Zweicker's book?? would you tell me the name of the book or more information about the l3psycho_anal_ns >Keep in mind that all the models are very crude estimates, >and the output should be considered as a rough guide to the noise >shaping algorthims rather than absolute truth. _______________________________________________ mp3encoder mailing list [EMAIL PROTECTED] http://minnie.tuhs.org/mailman/listinfo/mp3encoder