About the most expensive in quantize() seem to be the sqrt()
calculations.
Yesterday I found a fast sqrt() calculation (basically a table and some 
simple arithmetic) where the size of the table depends upon the required 
precision.
I can not include the algorithm here but you can find it in:

Andrew Glassner
"Graphics Gems" (don't be fooled by the title ;)
ISBN 0-12-286166-3

on p424 and pp756.
Sorry, my time is extremely limited right now and I do not think I can 
follow this or make a substitute for lame in the next week. But I will
look again for the author of the algo next week, maybe there is
something on the web.


My usual bench with lame3.12pre5:
(P-III/450, everything compiled with the options I proposed in the
Makefile)

Orig:    224s

         egcs-1.1.2   egcs-990602
default: 135s         117s
-m f   : 113s          94s
-f     :  47s          34s!
-m f -f:  54s          40s

Something else:
I tried *pix++ instead of array indizes on a late 3.11 or early 3.12
before but surprisinly it was slightly slower on above system.
The efficiency of the number of comparisons with constants (temp <
5.5xxx) is directly related to the number of available FP registers, 
so adding one comparison too much leads to worse performance.
lame development fortunately was too fast so that I dropped my changes
meanwhile but I could gain some performance  either adding one more 
comparison or removing one. (though I made other changes as well)
Sorry to be so unspecific but THESE are the things that should be
commented in the source.
Other CPU architectures with more FP registers like e.g. PPC could well 
gain performance by extending this scheme.

Frank


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )

Reply via email to