This is the new ass-kicking quantize-pvt.c with inline assembly for MSVC.
Now with assembly for new and ISO and short and regular blocks.
Performance on my test case went from 1:59 to 1:42 at high quality - a 14%
improvement.
Encode rate is right around 3 on a P3-500.
Assembly is unfortunately uncommented, but it's a pretty straightforward
implementation, it unrolls the loop 4 times and keeps constants in FP
registers.
Short blocks are not unrolled, but constants are kept in registers.
For best performance on regular pentiums, all fxch instructions should be
moved
up to just after the prior fp instruction. Don't have a regular pentium here
to test on, though.
Enjoy.
quantize-pvt.c