Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

Justin Lebar via cfe-commits Tue, 17 May 2016 18:06:06 -0700

jlebar added a comment.

> But people also don't expect IEEE compliance on GPUs



Is that true?  You have a lot more experience with this than I do, but my 
observation of nvidia's hardware is that it's moved to add *more* IEEE 
compliance as it's matured.  For example, older hardware didn't support 
denormals, but newer chips do.  Surely that's in response to some users.

One of our goals with CUDA in clang is to make device code as similar as 
possible to host code.  Throwing out IEEE compliance seems counter to that goal.

I also don't see the bright line here.  Like, if we can FMA to our heart's 
content, where do we draw the line wrt IEEE compliance?  Do we turn on 
flush-denormals-to-zero by default?  Do we use approximate transcendental 
functions instead of the more accurate ones?  Do we assume floating point 
arithmetic is associative?  What is the principle that leads us to do FMAs but 
not these other optimizations?

In addition, CUDA != GPUs.  Maybe this is something to turn on by default for 
NVPTX, although I'm still pretty uncomfortable with that.  Prior art in other 
compilers is interesting, but I think it's notable that clang doesn't do this 
for any other targets (afaict?) despite the fact that gcc does.

The main argument I see for this is "nvcc does it, and people will think clang 
is slow if we don't".  That's maybe not a bad argument, but it makes me sad.  :(


http://reviews.llvm.org/D20341



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Re: [PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

Reply via email to