[PATCH] D127901: [LinkerWrapper] Add PTX output to CUDA fatbinary in LTO-mode

Artem Belevich via Phabricator via cfe-commits Thu, 16 Jun 2022 14:40:38 -0700

tra added a comment.

Playing devil's advocate, I've got to ask -- do we even want to support JIT?


JIT brings more trouble than benefits.

- substantial start-up time on nontrivial apps. Last time I tried launching a 
tensorflow app and needed to JIT its kernels, it took about half an hour until 
JIT was done.
- substantial increase in the size of the executable. Statically linked 
tensorflow apps are already pushing the limits of the executables that use 
small memory model (-mcmodel=small is the default for clang and gcc, AFAICT).
- very easy to make a mistake, compile for a wrong GPU and not notice it, 
because JIT will try to keep it running using PTX.
- makes executables and tests non-hermetic -- the code that will run on GPU 
(and thus the behavior) will depend on particular driver version the apps uses 
at runtime.

Benefits: It *may* allow us to run a miscompiled/outdated CUDA app. Whether 
it's actually a benefit is questionable. To me it looks like a way to paper 
over a problem.

We (google) have experienced all of the above and ended up disabling PTX 
JIT'ting altogether.

That said, we do embed PTX by default at the moment, so this patch does not 
really change the status quo, so I'm not opposed to it, as long is we can 
disable PTX embedding if we need/want to.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127901/new/

https://reviews.llvm.org/D127901

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D127901: [LinkerWrapper] Add PTX output to CUDA fatbinary in LTO-mode

Reply via email to