masahi opened a new pull request #7463:
URL: https://github.com/apache/tvm/pull/7463


   I've been working on optimizing MaskRCNN/FasterRCNN, one I thing I found was 
that Ansor generates better code for NHWC layout. So I'm looking at improving 
NHWC end to end support / performance.
   
   `roi_align` only supports NCHW layout, so `layout_transform` is inserted 
before/after each `roi_align`. This layout transform turned out expensive on 
GPU, so I added NHWC impl to `roi_align` to remove `layout_transform`. This 
cuts Faster RCNN runtime by 8 milli second.
   
   Now, with Ansor NHWC tuning + `roi_align` improvement in this PR, we are 
beating pytorch by a large margin:
   ```
   TVM NHWC(Ansor) + cublas: 0.0589 milli sec
   PyTorch: 0.0738 milli sec
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to