masahi commented on pull request #7669:
URL: https://github.com/apache/tvm/pull/7669#issuecomment-800543774


   Ok updated to cast to float32 only in the problematic case, which is VK + 
dynamic input. I think this is an acceptable solution for now. Of course, the 
best solution is to implement TIR level CSE, since the host is doing the same 
compute anyway and there is no point computing log2 etc in device.
   
   Interestingly, TIR mergepath kernel used in sort, which is also littered 
with glsl log2 and ceil, doesn't cast to float64 before log2 in GPU IR. If you 
see the IR dump 
https://gist.github.com/masahi/c0979c61907af15f9924b3b3d72fe6a7, there is no 
`float64` anywhere. But for TIR scan downsweep kernel, there is cast to 
`float64`. 
   
   It could also be the case that our SPIRV codegen for int64 to float64 is 
busted, but I haven't checked. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to