[clang] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

Artem Belevich via cfe-commits Thu, 25 Jan 2024 10:41:41 -0800

Artem-B wrote:

> It's not unspecified per-se, it just picks the one the CUDA driver assigned 
> to ID zero, so it will correspond to the layman using a default device if 
> loaded into CUDA.


The default "fastest card first" is also somewhat flaky. First, the "default" 
enumeration order is affected by the environment (could be by PCI ID, or by 
"highest-performance-first") which adds another external parameter the user may 
or may not be aware of. The "highest performance first" is also known to be 
wrong. E.g. on my machine CUDA runtime was picking a puny newer card I used for 
graphics over a 2 orders of magnitude faster compute card.

> I think that it's much less intuitive currently where we'll just have it 
> default to sm_52

That would fall under the "any default choice for GPU will be wrong" and the 
implication that it's up to the user to explicitly provide the correct set of 
GPUs to target.

On the other hand, I'd be OK with providing `--offload-arch=native` translating 
into "compile for *all* present GPU variants", with a possibility to further 
adjust the selected set with the usual `--no-offload-arch-foo`, if the user 
needs to. This will at least produce code that will run on the machine where 
it's built, be somewhat consistent and is still adjustable by the user when the 
default choice will inevitably be wrong.



https://github.com/llvm/llvm-project/pull/79373
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Add support for -march=native in standalone NVPTX (PR #79373)

Reply via email to