One more data point: I tweaked my copy of the coreutils code to use
asms for the "obvious" things, in particular, to use the byte-swapping
instructions on the x86 instead of the complicated expression
involving shifting and masking.

The measured performance went down.

That is, it took more instructions for GCC 4.0.0 to get the data into
the "right" registers and use the byte-swap instruction than for it to
simply do the shifts and masks itself.  A humbling experience....


_______________________________________________
Bug-coreutils mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Reply via email to