Hahnfeld added a comment. In https://reviews.llvm.org/D47849#1192134, @gtbercea wrote:
> This patch is concerned with calling device functions when you're on the > device. The correctness issues you mention are orthogonal to this and should > be handled by another patch. I don't think this patch should be held up any > longer. I'm confused by now, could you please highlight the point that I'm missing? IIRC you started to work on this to fix the problem with inline assembly (see https://reviews.llvm.org/D47849#1125019). AFAICS this patch fixes declarations of math functions but you still cannot include `math.h` which most "correct" codes do. In https://reviews.llvm.org/D47849#1170670, @tra wrote: > The rumors of "high performance" functions in the libdevice are somewhat > exaggerated , IMO. If you take a look at the IR in the libdevice of recent > CUDA version, you will see that a lot of the functions just call their llvm > counterpart. If it turns out that in some case llvm generates slower code > than what nvidia provides, I'm sure it will be possible to implement a > reasonably fast replacement. So regarding performance it's not yet clear to me which cases actually benefit: Is there a particular function that is slow if LLVM's backend resolves the call vs. the wrapper script directly calls libdevice? If I understand @tra's comment correctly, I think we should have clear evidence (ie a small "benchmark") that this patch actually improves performance. Repository: rC Clang https://reviews.llvm.org/D47849 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits