On Monday, 4 April 2016 at 22:34:06 UTC, Walter Bright wrote:
On 4/4/2016 2:05 PM, 9il wrote:
- Count of FP/Integer registers
??
How many general purpose registers, SIMD Floating Point registers, SIMD Integer
registers have a CPU?

These are deducible from X86, X86_64, and SIMD version identifiers.


It is impossible to deduct from that combination that Xeon Phi has 32 FP registers.

Needs to know is it AVX or AVX2 in compile time

Since the compiler never generates AVX or AVX2 instructions, there is no purpose to setting such as a predefined version identifier. You might as well use a:

    -version=AVX

switch. Note that it is a very bad idea for a compiler to detect the CPU it is running on and default generate code specific to that CPU.


"Since the compiler never generates AVX or AVX2" - this is definitely nor true, see, for example, LLVM vectorization and SLP vectorization.

This is normal situation for scientific software, supercomputers software, hight performance server applications.


(this may be completely different source code for this cases).

It's entirely practical to compile code with different source code, link them *both* into the executable, and switch between them based on runtime detection of the CPU.


This approach is complex, and normal for desktop applications. If you have a big cluster of similar computers or you have a supercomputer cluster, only the thing you want to do is `-mcpu=native`/ `-march=native`. And this single compiler flag should be enough to build hight performance linear algebra application.


We have LDC and GDC. And looks like a little bit standardization based on DMD
would be good, even if this would be useless for DMD.

There is no such thing as a standard compiler floating point switch, and I'm doubtful defining one would be practical or make much of any sense.


I just want an unified instrument to receive CT information about target and optimization switches. It is OK if this information would have different switches on different compilers.


With compile time information about CPU it is possible to always have fast generic BLAS for any target as soon as LLVM is released for this target.

The SIMD instruction set is highly resistant to transforming generic code into optimal vector instructions. Yes, I know about auto-vectorization, and in general it is a doomed and unworkable technology.

  http://www.amazon.com/dp/0974364924

It's gotta be done by hand to get it to fly.

Auto vectorization is only example (maybe bad). I would use SIMD vectors, but I need CT information about target CPU, because it is impossible to build optimal BLAS kernels without it! My idea is internal kernel compiler :-) Something similar to compile time regex, but more complex.

Best regards,
Ilya

Reply via email to