Andrew Stubbs wrote:
/tmp/ccrsHfVQ.mkoffload.2.s:788736:27: error: value out of range
           .amdhsa_next_free_vgpr        516                                         ^~~ [Obviously, likewise forlibgomp.c++/..
Hmm, supposedly there are 768 registers allocated in groups of 12, on gfx1100 (8 on other devices), which number you have to double on wavefrontsize64 because that field actually counts the number of 32-lane registers. The ISA can only actually reference 256 registers, so the limit here should be 512. (The remaining registers are intended for other wavefronts to use.)

But 256 is not divisible by 12, and it looks like we've rounded up. I guess we need to set the limit at 252 (504), for gfx1100.

BTW: The LLVM source code has,
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp#L1066

unsigned getTotalNumVGPRs(const MCSubtargetInfo *STI) {
  if (STI->getFeatureBits().test(FeatureGFX90AInsts))
    return 512;
  if (!isGFX10Plus(*STI))
    return 256;
  bool IsWave32 = STI->getFeatureBits().test(FeatureWavefrontSize32);
  if (STI->getFeatureBits().test(FeatureGFX11FullVGPRs))
    return IsWave32 ? 1536 : 768;
  return IsWave32 ? 1024 : 512;
}


Tobias

Reply via email to