CaptainDuke edited a comment on pull request #8479: URL: https://github.com/apache/tvm/pull/8479#issuecomment-884044279
> Could you provide timing information for a variety of shapes and ranks. I just want to make sure this is faster on all inputs.  @tkonolige We evalutate the performance with 3 types of ranks and shapes. Time (nanoseconds) is collected using Nsight System. So long as the original `with ib.for_range() as i` is large enough, the separated two kernels would enlarge dimGrid and achieve better parallelism significantly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
