masahi opened a new pull request #7463: URL: https://github.com/apache/tvm/pull/7463
I've been working on optimizing MaskRCNN/FasterRCNN, one I thing I found was that Ansor generates better code for NHWC layout. So I'm looking at improving NHWC end to end support / performance. `roi_align` only supports NCHW layout, so `layout_transform` is inserted before/after each `roi_align`. This layout transform turned out expensive on GPU, so I added NHWC impl to `roi_align` to remove `layout_transform`. This cuts Faster RCNN runtime by 8 milli second. Now, with Ansor NHWC tuning + `roi_align` improvement in this PR, we are beating pytorch by a large margin: ``` TVM NHWC(Ansor) + cublas: 0.0589 milli sec PyTorch: 0.0738 milli sec ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
