http://sbel.wisc.edu/Forum/index.php?topic=19.0
Topic: getting CUDA assembly code with nvcc (Read 634 times)
Dan Negrut
|
 |
« on: September 17, 2008,
04:51:55 PM »
|
|
Last
lecture i mentioned that you want to have as much as possible your
variables stored in registers and/or shared memory.
However,
if you have too many variables that you hope to be stored in the
registers they might end up becoming local variables, which are stored
in the global memory. This is going to come at a performance price:
what you hoped to access with no overhead and wanted to use
intensively, ends up stored in the global memory and still used
intensively. From where the performance hit.
In order to be
able to figure out how many registers your code uses, as well as how
much shared memory you use, you need to take a look at assembly code
that is output as a result of an invocation of the nvcc driver on your
cuda source file.
In other words, let's say that your typical
command line to compile the .cu file and turn it into a binary looks
like this (very typical, get it from Developer Studio from the build
log):
nvcc.exe -ccbin "C:\Program Files\Microsoft Visual Studio
8\VC\bin" -c -D_CONSOLE -Xcompiler "/EHsc /W3 /nologo /Wp64 /O2 /Zi
/MT " -I"C:\CUDA\include" -I"C:\Progra~1\NVIDIA~1\NVIDIA~1\common\inc"
-o Release\driver.obj driver.cu
This needs to be modified to:
nvcc.exe
-ptx -D_CONSOLE -Xcompiler "/EHsc /W3 /nologo /Wp64 /O2 /Zi /MT "
-I"C:\CUDA\include" -I"C:\Progra~1\NVIDIA~1\NVIDIA~1\common\inc"
driver.cu
The most important thing is that you tell the nvcc
driver to not generate a binary (with -ccbin) but rather generate
assembly code (through the setting -ptx). Also you need to drop some
fluff to get it to fly.
Note that i also attached a snapshot of
Developer Studio that shows the BuildLog.htm file (you get this when
you build your solution in Dev Studio)
I hope this helps.
Dan
|
|
|
Logged
|
|
|