AntonMoberg opened a new pull request, #17673: URL: https://github.com/apache/tvm/pull/17673
Fixed bug where CUDA codegen produces faulty code when a vectorizable BufferLoadNode contains a Float8 type. Codegen generated the invalid signature "make___nv_fp8x2_e5m2(param_0[v_.x], param_0[v_.y])" where "param_0" is of type "__nv_fp8_e5m2* __restrict__". This commit adds a missing check "is_float8()" for CodeGenCUDA::PrintVecElemLoadExpr that is called for vectorizable BufferLoadNodes. Which instead correctly generates the signature "_nv_fp8x2_e5m2(make_float2(static_cast<float>(param_0[v_.x], static_cast<float>(param_0[v_.y]))) Additionally this commit removes the added "make_" prefix for float8 in CodeGenCuda::PrintVecConstructor as the correct way to instansiate an nv_fp8x2_[e5m2/e4m3] is through the "_nv_fp8x2_[e5m2/e4m3]" constructor itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
