Hi Davide,
thanks for the information!
> >
> > For Layer 3, why bother with the "polyphase" filterbanks? Why can't
> > you process the whole frame with a single MDCT - then start removing
> > and quantizing the MDCT spectral coefficients based on the psy model?
> >
>
> The MPEG1layer3 use the same structure as the previous layer ( subband filter..) and
> add new more performance �tools� (as the MDCT transformation..).The subband filter is
> used when we will quantize the �time� samples, divided in different frequency groups:
> in effect the new compression algorithm �AAC� (MPEG2 NBC/MPEG4) use only the MDCT
> transformation without subband filtering.
>
So the filterbank is only in layer-3 because it was left over from
layer-1 & layer-2? It does seem strange to do a Hann windowed DCT (the
filter bank) and then process the output of that DCT with another
DCT (as I understand it, the MDCT is a windowed DCT with a fancy
overlapped orthogonal window).
And what about the FFT for the psy-model? If, as in MPEG4, the
whole frame is processed with a MDCT, couldn't these MDCT spectral
coefficients be used for the psy-model, rather than requiring
an additional FFT? (and anyone know why an FFT is needed
for this stage, when everything else was a DCT? Is it important to have
sines and cosines for the psy-model?
>
> There are four different cases of masking paradigms:
> 1. Noise-masking-tone,
> 2. Tone-masking-tone,
> 3. Noise-masking-noise
> 4. Tone-masking-noise.
>
> In our case, the maskee (the element masked by another element, �the masker�) is the
> quantization noise: so we are interessed only in the 3. And 4. Case.
> The problem is that this two cases have received little attention in psychoacoustic
> research, due to the difficult of such measurement.
> The following observations (with the results from the first two cases measurement)
> help us to determine the offset:
> The noise is a better masker than a tone and a tone is an easier maskee than noise.
>
This is a very intimidating problem! Maybe FhG does
deserve our hard earned money! I'd buy their linux encoder if
it didn't cost 4x more than the win95 version.
One thing nice about an open source encoder: we could get many
volunteers to do hearing tests and tweek the algorithm. So in a sence
it would "evolve" like Mike suggested. I think this software model
would eventually lead to an unbeatable encoder. But maybe the
required hearing tests (both figuring out what to test and how to test
it) is so complex that it can't really be done by a bunch of unix
hackers with cheap sound cards and computer speakers?
Mark