ekalda commented on PR #16782: URL: https://github.com/apache/tvm/pull/16782#issuecomment-2020983646
Thank you for your feedback @Lunderberg, much appreciated! > The implementation looks reasonable, though I have one main question for it: What is the behavior of the updated pass for a target that doesn't support SVE? Prior SVE-commits enabled the functionality, but didn't produce SVE in any of the default lowering passes. > > From [this line](https://github.com/apache/tvm/pull/16696/files#diff-f61b04b100f5145f2681340c81d3f2af221239594ed01e2e24896522329ce92cR598-R600), versions of LLVM before 11.0 do not support SVE, nor from my brief reading of the CUDA codegen [here](https://github.com/apache/tvm/blob/main/src/target/source/codegen_cuda.cc#L253) does CUDA. When it comes to targets that don't support SVE, I'd expect these targets to not trigger the creation of scalable vectors. In the current plan the creation of scalable vectors has to be intentional, i.e. it comes from splitting an axis by a `vscale` dependent expression in the (target dependent) schedules and vectorizing the resulting axis. If the `LoopVectorizer` is trying to create scalable vectors for target that doesn't support it, something has gone wrong and the compilation will fall over at some point: * If by some mistake a schedule that doesn't support VLA programming contains `vscale`, it will fall over latest in a target dependent codegen * If there is an attempt to vectorize loops with non-int extent that doesn't contain `vscale`, the "scalable ramp" creation will error since it expects the `PrimExpr lanes` in a form `vscale * int`. I realize though that this is a weird deviation from a current behaviour of ``` if (!extent_as_int || extent_as_int->value < 1) { LOG(FATAL) << "Failed to vectorize loop with extent " << op->extent; } ``` so I'll modify the patch such that it checks for a target and fails as before if the extent is not an int. > Since `VectorizeLoop` occurs after the `BindTarget` pass, we can check the function attribute to know which target will be executing each function. I think we should have the loop vectorization apply only to fixed-extent loops by default, but enable the scalable vectorization for targets that support it. In principle I'm not against making vectorizing for scalable vectors functionality more explicitly target specific, but it is not obvious to me what that would mean in terms of code? `ICHECK`s for the appropriate targets at the places where scalable vectors are created? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
