I'm encountering this error as I run my code on the same docker environment
but on different workstations.

```
Traceback (most recent call last):
  File "simple_peer.py", line 76, in <module>
    tslr_gpu, lr_gpu = mp.initialise()
  File "/root/distributed-mpp/naive/mccullochpitts.py", line 102, in
initialise
    """, arch='sm_60')
  File "/root/anaconda3/lib/python3.6/site-packages/pycuda/compiler.py",
line 294, in __init__
    self.module = module_from_buffer(cubin)
pycuda._driver.LogicError: cuModuleLoadDataEx failed: device kernel image
is invalid -

```
I did a quick search and only found this :
https://github.com/inducer/pycuda/issues/45 , but it doesn't seem to be
relevant to my problem as it runs on my initial workstation. Can anyone see
what is the issue?

Below is my code that I'm trying to run:
```
def initialise(self):
        """
        Documentation here
        """

        mod = SourceModule("""
        #include <math.h>
        __global__ void initial(float *tslr_out, float *lr_out, float
*W_gpu,\
            float *b_gpu, int *x_gpu, int d, float temp)
        {
            int tx = threadIdx.x;

            // Wx stores the W_ji x_i product value
            float Wx = 0;

            // Matrix multiplication of W and x
            for (int k=0; k<d; ++k)
            {
                float W_element = W_gpu[tx * d + k];
                float x_element = x_gpu[k];
                Wx += W_element * x_element;
            }

            // Computing the linear response, signed linear response with
temp
            lr_out[tx] = Wx + b_gpu[tx];
            tslr_out[tx] = (0.5/temp) * (1 - 2*x_gpu[tx])* (Wx + b_gpu[tx]);
        }
        """, arch='sm_60')

        func = mod.get_function("initial")

        # format for prepare defined at
https://docs.python.org/2/library/struct.html
        func.prepare("PPPPPif")

        dsize_nparray = np.zeros((self.d,), dtype = np.float32)

        lr_gpu = cuda.mem_alloc(dsize_nparray.nbytes)
        slr_gpu = cuda.mem_alloc(dsize_nparray.nbytes)
        tslr_gpu = cuda.mem_alloc(dsize_nparray.nbytes)

        grid=(1,1)
        block=(self.d,1,1)
        # block=(self.d,self.d,1)

        func.prepared_call(grid, block, tslr_gpu, lr_gpu, self.W_gpu, \
                self.b_gpu, self.x_gpu, self.d, self.temp)

        return tslr_gpu, lr_gpu
```
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Reply via email to