Re: D and GPGPU

ponce via Digitalmars-d Wed, 18 Feb 2015 11:18:09 -0800

On Wednesday, 18 February 2015 at 15:15:21 UTC, Russel Winderwrote:

The issue is to create a GPGPU kernel (usually C code withbizarre datastructures and calling conventions) set it running and thenpipe data inand collect data out – currently very slow but the nextgeneration ofIntel chips will fix this (*). And then there is theOpenCL/CUDA debate.
Personally I think OpenCL, for all it's deficiencies, as it isvendorneutral. CUDA binds you to NVIDIA. Anyway there is an NVIDIAback endfor OpenCL. With a system like PyOpenCL, the infrastructuredata andprocess handling is abstracted, but you still have to write thekernelsin C. They really ought to do a Python DSL for that, but… Sowith D canwe write D kernels and have them compiled and loaded using acombination
of CTFE, D → C translation, C ompiler call, and other magic?

I'd like to about the kernel languages (having done both OpenCLand CUDA).

A big speed-up factor is the multiple level of parallelismexposed in OpenCL C and CUDA C:


- contect parallelism (eg. several GPU)
- command parallelism (based on a future model)
- block parallelism
- warp/sub-block parallelism
- in each sub-block, N threads (typically 32 or 64)

All of that supported by appropriate barrier semantics. TypicalC-like code only has threads as parallelism and a less complexcache.

Also most algorithms don't translate all that well to SIMDthreads working in lockstep.

Example: instead of looping on that 2D image and perform anhorizontal blur on 15 pixels, instead perform this operation on32x16 blocks simultaneously, while caching stuff in block-localmemory.

It is much like an auto-vectorization problem andauto-vectorization is hard.

Re: D and GPGPU

Reply via email to