If you want to find out how to best optimize for Intel chips, you
would get the Intel compilers and see what they do.  The same thing is
useful for z.

One suggestion would be to code a algorithms similar to what you want
to code in assembler and run it through the IBM xlC/C++ compiler (or
Metal C) with the highest optimization level and look at the generated
psuedo-assembler listing.   Make sure to compile with "ARCH" at the
level of the target machine instruction set.

You'll find that everything runs "baseless", and that memcpy() (where
the length is not known by the compiler) generates code that doesn't
use MVCL.  The loop unwinding stuff that it does is also fascinating,
as well as storage access/use decoupling stuff that it does to take
advantage of pipe-lining.

Given that IBM is really the only one who knows completely how to
optimize for z, at some point you have to ask "why write in assembler"
for z?   For Linux on z, the answer, is that most people shouldn't
unless you write device drivers.   On z/OS, the system API is still
largely assembler (for customers; IBM uses PL/X), and Metal-C doesn't
really offer a practical alternative.    Dignus' approach for
imbedding assembler in C/C++ seems much cleaner to me, but aren't we
at the mercy of IBM to do really good optimizing compilers?

Kirk Wolf
Dovetailed Technologies
http://dovetail.com

Reply via email to