vielmetti commented on issue #18561: URL: https://github.com/apache/tvm/issues/18561#issuecomment-3793544990
> Input shape: (14, 23, 67, 99) ≈ 1.7M elements Have you tried this with various input shapes to see if there's anything about the sizes of the parameters that affects the slowdown or speedup? In particular if you make the element count 2x/4x/8x the size, is the resulting computation 2x/4x/8x the cost? Does this same slowdown happen on other RVV hardware or in emulation? I remember old arm64 server hardware where the NEON units were technically speedups but in fact were so slow that scalar operations turned out to be faster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
