About the most expensive in quantize() seem to be the sqrt()
calculations.
Yesterday I found a fast sqrt() calculation (basically a table and some
simple arithmetic) where the size of the table depends upon the required
precision.
I can not include the algorithm here but you can find it in:
Andrew Glassner
"Graphics Gems" (don't be fooled by the title ;)
ISBN 0-12-286166-3
on p424 and pp756.
Sorry, my time is extremely limited right now and I do not think I can
follow this or make a substitute for lame in the next week. But I will
look again for the author of the algo next week, maybe there is
something on the web.
My usual bench with lame3.12pre5:
(P-III/450, everything compiled with the options I proposed in the
Makefile)
Orig: 224s
egcs-1.1.2 egcs-990602
default: 135s 117s
-m f : 113s 94s
-f : 47s 34s!
-m f -f: 54s 40s
Something else:
I tried *pix++ instead of array indizes on a late 3.11 or early 3.12
before but surprisinly it was slightly slower on above system.
The efficiency of the number of comparisons with constants (temp <
5.5xxx) is directly related to the number of available FP registers,
so adding one comparison too much leads to worse performance.
lame development fortunately was too fast so that I dropped my changes
meanwhile but I could gain some performance either adding one more
comparison or removing one. (though I made other changes as well)
Sorry to be so unspecific but THESE are the things that should be
commented in the source.
Other CPU architectures with more FP registers like e.g. PPC could well
gain performance by extending this scheme.
Frank
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )