Hi, I am trying to use FP16 (half-precision) with pycuda. However I have encountered an issue when trying to use this for element-wise kernels.
If I try a very simple kernel: ============================== import numpy as np import pycuda.autoinit from pycuda.elementwise import ElementwiseKernel as CU_ElK import pycuda.gpuarray as cua cu_options = ['-use_fast_math', '-D__CUDA_NO_HALF_OPERATORS__', '-D__CUDA_NO_HALF2_OPERATORS__'] testk = CU_ElK(name='testk', operation="d[i] *= 2", preamble='#include <cuda_fp16.h>', options=cu_options, arguments="float *d") cu_d = cua.empty(128, dtype=np.float32) testk(cu_d) ============================== (the kernel does not even use half-precision, only the fp16 header is necessary to trigger the issue) This works on MacOS (it only requires the D__CUDA_NO_HALF_OPERATORS__ to avoid multiple linkage), but on debian9 and Ubuntu20 it fails with a bunch of errors like: ... /usr/include/c++/8/bits/stl_pair.h(446): error: this declaration may not have extern "C" linkage … which come from the cuda_fp16.h using STL headers (std::move etc..). This is due to the kernel being compiled with an ‘extern “C”’ directive, which is necessary to avoid C++ name mangling and still be able to access the element wise kernel function. The workaround is to include the cuda_fp16.h header _before_ the ‘extern “C”’ - I’ve tested this and that runs without a hitch. So my question is how to proceed - I’d like as much as possible to directly use pycuda without having to write a derived version of SourceModule and the element-wise code. I see two options: 1) if there is a way to have an element-wise kernel with no_extern_c=True - but I don’t know how to resolve the name mangling issue to access the kernel function ? 2) add a ‘cpp_preamble’ option to SourceModule and ElementwiseKernel (and others) to add a preamble before the ‘extern “C”’ I could propose a PR for 2) but I’d like to know if that’d be acceptable in pycuda. Note that it also removes the need for D__CUDA_NO_HALF_OPERATORS__ Thanks, Vincent — Vincent Favre-Nicolin Co-editor, J. Synchrotron Radiation http://journals.iucr.org/s/ <http://journals.iucr.org/s/> Director, HERCULES school http://hercules-school.eu <http://hercules-school.eu/> ESRF-The European Synchrotron http://www.esrf.eu <http://www.esrf.eu/> 71, Avenue des Martyrs Grenoble, France X-Ray NanoProbe (XNP) group Tel: +33 4 76 88 28 11 On leave from Univ. Grenoble Alpes
_______________________________________________ PyCUDA mailing list -- pycuda@tiker.net To unsubscribe send an email to pycuda-le...@tiker.net