Hi! As an aside:
On 2022-02-03T21:00:50+0000, "Roger Sayle" <ro...@nextmovesoftware.com> wrote: > the exact register usage of a nvptx kernel depends upon the version of > the Cuda drivers being used (and the hardware) Yeah, that's a "problem" -- or: "challenge"? ;-) The GCC/nvptx back end is generating some rather high-level IR (PTX) targeting a "black hole": not knowing what exactly the Nvidia/CUDA Driver, PTX -> SASS compiler are going to do with it. (Well, similar problem also exists for more traditional ISAs if CPU microcode etc. is involved, but it's certainly more severe here.) Five years ago, I asked our then Nvidia PTX contact person about ideas, "How to generate PTX code to the PTX -> SASS compiler's liking": | We're currently looking into options for improving the PTX code generated | by GCC's nvptx back end, and it came up the question about how to | generate PTX code to the PTX -> SASS compiler's liking? Is there any | documentation available regarding this? (I say "PTX -> SASS compiler" as | I don't think I know the proper name of it. For avoidance of doubt, I | mean the "component" that sits between the PTX code we feed into | cuLinkAddData, and what actually gets executed on the GPU as SASS code. | Presumably the same "component" that is part of the "ptxas" tool?) | | As always, there are often many different variants for expressing the | same thing. A few examples. | | [...] | | ;-) Any so on, and so forth. Are there any generic recommendations, | "best practice"? The answer was: | I don't know of any official documentation; in general we have tuned the backend to the PTX that we generate, so following that lead will give you the best results. So, yeah. :-\ Understandable and not unexpected, though. Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955