FranckQC edited a comment on pull request #9482: URL: https://github.com/apache/tvm/pull/9482#issuecomment-969151523
> > > Hi @FranckQC, I wanted TIR-level CSE for a long time, so very excited to see this! > > What I wanted to do is to eliminate common expressions that span across the host and GPU - for example, in GPU `sort` kernel, I need to make `log2(N)` GPU kernel calls from the host to sort the input bottom up. In principle, `log2(N)` needs to computed once by the host and pass to the GPU kernel, but since we cannot CSE `log2(N)` expression that appears both in the host and GPU kernel, right now the GPU sort kernel is littered with `log2(N)` compute like this (note a lot of calls to `call_spirv_pure_glsl450` which is totally unnecessary if we had TIR-level CSE) https://gist.github.com/masahi/7a755ef67009e1a836e3212c53cf496f > > Is this PR going to solve my problem? Hi @masahi Thanks a lot for the kind words! I'm happy to read that this new pass might be useful to you. In principle, every redundant subterms that are eligible for being commoned out (i.e, which does not contain function calls, etc) will be commoned out. There are also a few other minor restrictions which are due to some specifics of TVM, but these are rare. Unfortunately, on the TIR code snippet that you have uploaded, it seems that the subexpression that is redundant is just a function call. These can't be commoned out into a new variable by the CSE pass as it does not have any guarantee that the function has no side effects, meaning that it will always produce the same outputs for the same inputs. Without this guarantee, commoning out such function calls could change the program's semantics, so it's not done as preserving the semantics of the program is vital. I can imagine that for functions that are guaranteed to not do any side effects (and which are therefore "functions" in the mathematical sense of the term), we could relax this restriction, but that would be an extension to implement in the future. And it would rely on some "NoSideEffect" tag on functions. However, please note that if you had some other redundancies, this CSE pass would common out whatever redundancies you have that are eligible. For instance : Assume you have the term (f(42) + f(42)) + (x*y + x*y) that appear somewhere. It is not eligible in its entirety, as it contains function calls. But its subterm (x*y + x*y) is eligible, so this subpart will be commoned out. In short, the CSE pass, as implemented, always try to common out all the redundant computations that are illegible, and it does it by looking from bigger subterms to smaller subterms. Does that answers your question? Please do not hesitate to tell me if you need help for trying the pass. Kind regards. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
