Hi,

        I am trying to use FP16 (half-precision) with pycuda. However I have 
encountered an issue when trying to use this for element-wise kernels.

        If I try a very simple kernel:
==============================
import numpy as np
import pycuda.autoinit
from pycuda.elementwise import ElementwiseKernel as CU_ElK
import pycuda.gpuarray as cua
cu_options = ['-use_fast_math', '-D__CUDA_NO_HALF_OPERATORS__', 
'-D__CUDA_NO_HALF2_OPERATORS__']
testk = CU_ElK(name='testk', operation="d[i] *= 2", preamble='#include 
<cuda_fp16.h>',
               options=cu_options, arguments="float *d")
cu_d = cua.empty(128, dtype=np.float32)
testk(cu_d)
==============================
(the kernel does not even use half-precision, only the fp16 header is necessary 
to trigger the issue)

        This works on MacOS (it only requires the D__CUDA_NO_HALF_OPERATORS__ 
to avoid multiple linkage), but on debian9 and Ubuntu20 it fails with a bunch 
of errors like:
...
/usr/include/c++/8/bits/stl_pair.h(446): error: this declaration may not have 
extern "C" linkage 
…

which come from the cuda_fp16.h using STL headers (std::move etc..).

        This is due to the kernel being compiled with an ‘extern “C”’ 
directive, which is necessary to avoid C++ name mangling and still be able to 
access the element wise kernel function.

        The workaround is to include the cuda_fp16.h header _before_ the 
‘extern “C”’ - I’ve tested this  and that runs without a hitch.

        So my question is how to proceed - I’d like as much as possible to 
directly use pycuda without having to write a derived version of SourceModule 
and the element-wise code.

        I see two options:

1) if there is a way to have an element-wise kernel with no_extern_c=True - but 
I don’t know how to resolve the name mangling issue to access the kernel 
function ?

2) add a ‘cpp_preamble’ option to SourceModule and ElementwiseKernel (and 
others) to add a preamble before the ‘extern “C”’


        I could propose a PR for 2) but I’d like to know if that’d be 
acceptable in pycuda. Note that it also removes the need for  
D__CUDA_NO_HALF_OPERATORS__

        Thanks,
                Vincent
— 
Vincent Favre-Nicolin

Co-editor, J. Synchrotron Radiation  http://journals.iucr.org/s/ 
<http://journals.iucr.org/s/>

Director, HERCULES school   http://hercules-school.eu 
<http://hercules-school.eu/>

ESRF-The European Synchrotron    http://www.esrf.eu <http://www.esrf.eu/>
71, Avenue des Martyrs
Grenoble, France

X-Ray NanoProbe (XNP) group
Tel: +33 4 76 88 28 11

On leave from Univ. Grenoble Alpes

_______________________________________________
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

Reply via email to