masahi edited a comment on pull request #7669:
URL: https://github.com/apache/tvm/pull/7669#issuecomment-800543774


   Ok updated to cast to float32 only in the problematic case, which is VK + 
dynamic input on TIR scan. I think this is an acceptable solution for now. Of 
course, the best solution is to implement TIR level CSE, since the host is 
doing the same compute anyway and there is no point computing log2 etc in 
device.
   
   Interestingly, TIR mergepath kernel used in sort, which is also littered 
with glsl log2 and ceil, doesn't cast to float64 before log2 in the GPU IR. If 
you see the IR dump 
https://gist.github.com/masahi/c0979c61907af15f9924b3b3d72fe6a7, there is no 
`float64` anywhere. But for TIR scan downsweep kernel, there is cast to 
`float64`. So I removed cast to float32 in TIR sort.
   
   It could also be the case that our SPIRV codegen for int64 to float64 cast 
is busted, but I haven't checked. Another weird thing is that glsl log2 works 
correctly if the input size is static.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to