>
> Okay, all, maybe someone with more documentation can help me.
>
> So, I've been experimenting with smaller FFTs for the long window inside the
> psychoacoustic model, like 768=3*256 and 576=9*64. My thinking here is that
> it would use only _relevant_ information for computing noise masking
> thresholds in the current frame. Or is this ludicrous?
>
I think this is a good idea, but I would stick with 768. The FFT's
are windowed by a function that tapers to zero at the end points,
and thus you want a window a little bigger than the 576 samples
that are being analyzed.
>
> The good news: the code is much faster doing a mixed-radix length 576 DFT
> than the current 1024 point FHT implementation.
>
> The bad news: static data in psy_data (tables.c) is relative to spectral
> lines in a 1024 point DFT. They are totally invalid for the shorter (576 or
> 768) long window. Does anyone know what functions were used to create the
> static data, or how to re-calibrate it?
>
Yeah, this will take some work, but it just a mapping from the FFT
energies (equally spaced in frequency) to 64 partitions,
(equally spaced in 'barks' - supposed to represent bands of equal
importance to the human ear). The calculations are done in
each partition, and then the partitions are lumped into 21
scalefactor bands (also given by tables).
The paper "A nonlinear psychoacsoutic model applied to the ISO MPEG
Layer 3 Coder" on www.mp3tech.org gives formulas that could be used to
replace the partition tables & spreading function. They dont do all
the predictability/tonality calculations, but those are simple formulas
with a few parameters ((like q_thr[], minval[]) which can be
interpolated to any set of partitions.
Mark
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )