AntonMoberg opened a new pull request, #17673:
URL: https://github.com/apache/tvm/pull/17673

   Fixed bug where CUDA codegen produces faulty code when a vectorizable 
BufferLoadNode contains a Float8 type.
   
   Codegen generated the invalid signature
   "make___nv_fp8x2_e5m2(param_0[v_.x], param_0[v_.y])" where "param_0" is of 
type "__nv_fp8_e5m2* __restrict__".
   
   This commit adds a missing check "is_float8()" for 
CodeGenCUDA::PrintVecElemLoadExpr that is called for vectorizable 
BufferLoadNodes. Which instead correctly generates the signature 
"_nv_fp8x2_e5m2(make_float2(static_cast<float>(param_0[v_.x], 
static_cast<float>(param_0[v_.y])))
   
   Additionally this commit removes the added "make_" prefix for float8 in 
CodeGenCuda::PrintVecConstructor as the correct way to instansiate an 
nv_fp8x2_[e5m2/e4m3] is through the "_nv_fp8x2_[e5m2/e4m3]" constructor itself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to