Hello,

I noticed some strange thing recently. Consider the following kernel:

__global__ void test(float *out)
{

       float a[2] = {0,0};
       a[0] = 1;
       a[1] = 2;
       out[0] = a[0];
       out[1] = a[1];
}

As far as I understand, a[2] should go into registers. According to
PTX file, which is produced by nvcc when I'm compiling simple .cu
file, it seems to be the case:

       .entry test (
               .param .u32 __cudaparm_test_out)
       {
       .reg .u32 %r<3>;
       .reg .f32 %f<4>;
       .loc    15      192     0
$LBB1_test:
       .loc    15      198     0
       ld.param.u32    %r1, [__cudaparm_test_out];
       mov.f32         %f1, 0f3f800000;        // 1
       st.global.f32   [%r1+0], %f1;
       .loc    15      199     0
       mov.f32         %f2, 0f40000000;        // 2
       st.global.f32   [%r1+4], %f2;
       .loc    15      200     0
       exit;
$LDWend_test:
       } // test

But when I'm trying to compile this kernel with PyCuda, for some
reason this function has attribute shared_size_bytes==20. Can anyone
please explain why is the size of shared memory non-zero? I am
completely at a loss here.

_______________________________________________
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net

Reply via email to