[PyCUDA] Incorrect shared memory size for kernel

2010-02-07 Thread Bogdan Opanchuk
Hello, I noticed some strange thing recently. Consider the following kernel: __global__ void test(float *out) { float a[2] = {0,0}; a[0] = 1; a[1] = 2; out[0] = a[0]; out[1] = a[1]; } As far as I understand, a[2] should go into registers. According to PTX

Re: [PyCUDA] Incorrect shared memory size for kernel

2010-02-07 Thread Andreas Klöckner
On Sonntag 07 Februar 2010, Bogdan Opanchuk wrote: .entry test ( .param .u32 __cudaparm_test_out) { .reg .u32 %r3; .reg .f32 %f4; .loc15 192 0 $LBB1_test: .loc15 198 0 ld.param.u32%r1,