One more data point: I tweaked my copy of the coreutils code to use asms for the "obvious" things, in particular, to use the byte-swapping instructions on the x86 instead of the complicated expression involving shifting and masking.
The measured performance went down. That is, it took more instructions for GCC 4.0.0 to get the data into the "right" registers and use the byte-swap instruction than for it to simply do the shifts and masks itself. A humbling experience.... _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
