On Saturday, 7 July 2018 at 13:26:10 UTC, Guillaume Piolat wrote:
On Friday, 6 July 2018 at 23:08:27 UTC, Random D user wrote:
Especially, since D doesn't even attempt any auto-vectorization (poor results and difficult to implement) and manual loops are quite tedious to write (even std.simd failed to materialize), so SPMD would be nice alternative.

I think you are mistaken, D code is autovectorized often when using LDC.

That is good to know.
I haven't looked that much into LDC (or clang). I mostly use dmd for fast edit-compile cycle. Although, plan is to use LDC for "release"/optimized build eventually.

Anyway, I would just want to code some non-trivial loops in SIMD, but I wouldn't want to fiddle with intrinsics. Or write a higher level wrapper for them.

In my experience, you can only get the real benefits out of SIMD if you carefully handcraft your hot loops to fully use it. Sprinkling some SIMD here and there with a SIMD vector type, doesn't really seem to yield big benefits.


Sometimes it's not and it's hard to know why.

Exactly.
In my experience compilers (msvc) often don't.

A pragma we could have is the one in the Intel C++ Compiler that says "hey this loop is safe to autovectorize".

What do you think?

I think that ispc is like OpenCL on the CPU, but can't work on the GPU, FPGA or other OpenCL implementation. OpenCL is so fast because caching is explicit (several levels of memory are exposed).

Yeah, it should be similar. The point is not run it on GPU, you can do CUDA, OpenCL, compute shader etc. for that. CPU code is much easier to debug, and sometimes you're already doing things on the GPU, but your CPU side has more room for computation. And you don't have to copy your data between the GPU and CPU or deal with latency. Of course, OpenCL runs on CPU too, but I think there's quite a bit of code required to set it up and to use it.

I guess my point was that I would like to do CPU SIMD code easily without intrinsics (or manually trying to trick the compiler to vectorize the code). SPMD stuff seems to solve these issues. It would also be a forward looking step for D.

Ideally, just write your loop normally, debug it and add an annotation to get it to run fast on SIMD. Done.

Reply via email to