Hello, I noticed some strange thing recently. Consider the following kernel:
__global__ void test(float *out) { float a[2] = {0,0}; a[0] = 1; a[1] = 2; out[0] = a[0]; out[1] = a[1]; } As far as I understand, a[2] should go into registers. According to PTX file, which is produced by nvcc when I'm compiling simple .cu file, it seems to be the case: .entry test ( .param .u32 __cudaparm_test_out) { .reg .u32 %r<3>; .reg .f32 %f<4>; .loc 15 192 0 $LBB1_test: .loc 15 198 0 ld.param.u32 %r1, [__cudaparm_test_out]; mov.f32 %f1, 0f3f800000; // 1 st.global.f32 [%r1+0], %f1; .loc 15 199 0 mov.f32 %f2, 0f40000000; // 2 st.global.f32 [%r1+4], %f2; .loc 15 200 0 exit; $LDWend_test: } // test But when I'm trying to compile this kernel with PyCuda, for some reason this function has attribute shared_size_bytes==20. Can anyone please explain why is the size of shared memory non-zero? I am completely at a loss here. _______________________________________________ PyCUDA mailing list pyc...@host304.hostmonster.com http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net