Re: Any usable SIMD implementation?

9il via Digitalmars-d Mon, 04 Apr 2016 23:17:06 -0700

On Monday, 4 April 2016 at 22:34:06 UTC, Walter Bright wrote:

On 4/4/2016 2:05 PM, 9il wrote:
- Count of FP/Integer registers
??
How many general purpose registers, SIMD Floating Pointregisters, SIMD Integer
registers have a CPU?
These are deducible from X86, X86_64, and SIMD versionidentifiers.

It is impossible to deduct from that combination that Xeon Phihas 32 FP registers.

Needs to know is it AVX or AVX2 in compile time
Since the compiler never generates AVX or AVX2 instructions,there is no purpose to setting such as a predefined versionidentifier. You might as well use a:
    -version=AVX
switch. Note that it is a very bad idea for a compiler todetect the CPU it is running on and default generate codespecific to that CPU.

"Since the compiler never generates AVX or AVX2" - this isdefinitely nor true, see, for example, LLVM vectorization and SLPvectorization.

This is normal situation for scientific software, supercomputerssoftware, hight performance server applications.

(this may be completely different source code for this cases).
It's entirely practical to compile code with different sourcecode, link them *both* into the executable, and switch betweenthem based on runtime detection of the CPU.

This approach is complex, and normal for desktop applications. Ifyou have a big cluster of similar computers or you have asupercomputer cluster, only the thing you want to do is`-mcpu=native`/ `-march=native`. And this single compiler flagshould be enough to build hight performance linear algebraapplication.

We have LDC and GDC. And looks like a little bitstandardization based on DMD
would be good, even if this would be useless for DMD.
There is no such thing as a standard compiler floating pointswitch, and I'm doubtful defining one would be practical ormake much of any sense.

I just want an unified instrument to receive CT information abouttarget and optimization switches. It is OK if this informationwould have different switches on different compilers.

With compile time information about CPU it is possible toalways have fastgeneric BLAS for any target as soon as LLVM is released forthis target.
The SIMD instruction set is highly resistant to transforminggeneric code into optimal vector instructions. Yes, I knowabout auto-vectorization, and in general it is a doomed andunworkable technology.
  http://www.amazon.com/dp/0974364924

It's gotta be done by hand to get it to fly.

Auto vectorization is only example (maybe bad). I would use SIMDvectors, but I need CT information about target CPU, because itis impossible to build optimal BLAS kernels without it! My ideais internal kernel compiler :-) Something similar to compile timeregex, but more complex.


Best regards,
Ilya

Re: Any usable SIMD implementation?

Reply via email to