masahi commented on pull request #9482: URL: https://github.com/apache/tvm/pull/9482#issuecomment-964681835
Hi @FranckQC, I wanted TIR-level CSE for a long time, so very excited to see this! What I wanted to do is to eliminate common expressions that span across the host and GPU - for example, in GPU `sort` kernel, I need to make `log2(N)` GPU kernel calls from the host to sort the input bottom up. In principle, `log2(N)` needs to computed once by the host and pass to the GPU kernel, but since we cannot CSE `log2(N)` expression that appears both in the host and GPU kernel, right now the GPU sort kernel is littered with `log2(N)` compute like this (note a log of calls to `call_spirv_pure_glsl450` which is totally unnecessary if we had TIR-level CSE) https://gist.github.com/masahi/7a755ef67009e1a836e3212c53cf496f Is this PR going to solve my problem? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
