On Monday, 4 April 2016 at 22:34:06 UTC, Walter Bright wrote:
On 4/4/2016 2:05 PM, 9il wrote:
- Count of FP/Integer registers
??
How many general purpose registers, SIMD Floating Point
registers, SIMD Integer
registers have a CPU?
These are deducible from X86, X86_64, and SIMD version
identifiers.
It is impossible to deduct from that combination that Xeon Phi
has 32 FP registers.
Needs to know is it AVX or AVX2 in compile time
Since the compiler never generates AVX or AVX2 instructions,
there is no purpose to setting such as a predefined version
identifier. You might as well use a:
-version=AVX
switch. Note that it is a very bad idea for a compiler to
detect the CPU it is running on and default generate code
specific to that CPU.
"Since the compiler never generates AVX or AVX2" - this is
definitely nor true, see, for example, LLVM vectorization and SLP
vectorization.
This is normal situation for scientific software, supercomputers
software, hight performance server applications.
(this may be completely different source code for this cases).
It's entirely practical to compile code with different source
code, link them *both* into the executable, and switch between
them based on runtime detection of the CPU.
This approach is complex, and normal for desktop applications. If
you have a big cluster of similar computers or you have a
supercomputer cluster, only the thing you want to do is
`-mcpu=native`/ `-march=native`. And this single compiler flag
should be enough to build hight performance linear algebra
application.
We have LDC and GDC. And looks like a little bit
standardization based on DMD
would be good, even if this would be useless for DMD.
There is no such thing as a standard compiler floating point
switch, and I'm doubtful defining one would be practical or
make much of any sense.
I just want an unified instrument to receive CT information about
target and optimization switches. It is OK if this information
would have different switches on different compilers.
With compile time information about CPU it is possible to
always have fast
generic BLAS for any target as soon as LLVM is released for
this target.
The SIMD instruction set is highly resistant to transforming
generic code into optimal vector instructions. Yes, I know
about auto-vectorization, and in general it is a doomed and
unworkable technology.
http://www.amazon.com/dp/0974364924
It's gotta be done by hand to get it to fly.
Auto vectorization is only example (maybe bad). I would use SIMD
vectors, but I need CT information about target CPU, because it
is impossible to build optimal BLAS kernels without it! My idea
is internal kernel compiler :-) Something similar to compile time
regex, but more complex.
Best regards,
Ilya