FranckQC edited a comment on pull request #9482:
URL: https://github.com/apache/tvm/pull/9482#issuecomment-969151523


   > 
   > 
   > Hi @FranckQC, I wanted TIR-level CSE for a long time, so very excited to 
see this!
   > 
   > What I wanted to do is to eliminate common expressions that span across 
the host and GPU - for example, in GPU `sort` kernel, I need to make `log2(N)` 
GPU kernel calls from the host to sort the input bottom up. In principle, 
`log2(N)` needs to computed once by the host and pass to the GPU kernel, but 
since we cannot CSE `log2(N)` expression that appears both in the host and GPU 
kernel, right now the GPU sort kernel is littered with `log2(N)` compute like 
this (note a lot of calls to `call_spirv_pure_glsl450` which is totally 
unnecessary if we had TIR-level CSE) 
https://gist.github.com/masahi/7a755ef67009e1a836e3212c53cf496f
   > 
   > Is this PR going to solve my problem?
   
   Hi @masahi 
   Thanks a lot for the kind words! I'm happy to read that this new pass might 
be useful to you.
   Yes, in principle, every redundant subterms that are eligible for being 
commoned out (i.e, which does not contain function calls, etc) will be commoned 
out. There are also a few restrictions which are due to some specifics of TVM, 
but these are rare.
   Do you have a little snippet of the TIR code that you have which has some 
redundancies? I cant try to tell if the CSE pass will be able to optimize it.
   Also please do not hesitate to play with the pass and to let me know if it 
does what you would hope to obtain. I can help of course.
   
   Kind regards.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to