ThomasRaoux wrote: @AlexMaclean, this PR doesn't seem to be NFC. This generates different PTX than before. In particular I see extra `mov.b32 %r, global_smem;` that seem to generate different sass and causes regressions in some Triton workloads. Before this PR I would only have one of those moves, now I see multiple. I haven't debugged why yet.
Is this is something you have noticed? https://github.com/llvm/llvm-project/pull/145581 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits