256 threads per block and as many blocks as there are multiprocessors,
so 30 for a gtx280 and 27 for a gtx260SP216 for example.
this was optimal when i did timing runs a while ago.
if you do the calculation this means that there are 8 warps per MP and 32 
threads per physical
core.
the values are probably best since they max out the number of threads and there 
are no
context switches between blocks that produce (a small) overhead as you have to 
copy
the lookup tables into the shared memory.

blocks and threads can be configured like

--device cuda:blocks=23:threads=42

> what's the occupancy rate of current CUDA kernels on recent GPUs with compute 
> capability 1.3?
> 
> 
> <hr>
> _______________________________________________
> A51 mailing list
> [email protected]
> http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
> 
> 


________________________________________________________________
Neu: WEB.DE Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://produkte.web.de/go/02/

_______________________________________________
A51 mailing list
[email protected]
http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51

Reply via email to