ThomasRaoux wrote:

> The problem seems to be that we're now reusing the `MOV_B64_i` instruction to 
> move the address of the global into a register. This instruction is marked as 
> `isAsCheapAsAMove = true` so we no longer bother to do CSE on it. This 
> doesn't necessarily seem like a problem or incorrect so I'm hesitant to "fix" 
> it by re-introducing a non-cheap mov instruction for global-addresses. We've 
> perturbed PTX a little bit and that can sometimes cause both regressions and 
> improvements.
> 
> @ThomasRaoux have you experimented with using maxnreg or --maxrregcount to 
> help PTXAS out here? If this kernel doesn't have a register target, this 
> might be the sort of thing that could change the compiler's guess about what 
> it should be.

Looking at the sass it doesn't use extra registers. I see extra arithmetic in 
the loop. I need to take a ncu trace to understand why it makes a significant 
difference but it might just be extra arithmetic and worse scheduling.
If ptxas doesn't treat this move as a no-op CSEing it would be nice. I'll check 
if I can find a workaround otherwise I'm not sure how to unblock this as the 
performance drop will be blocking our LLVM upgrade.

https://github.com/llvm/llvm-project/pull/145581
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to