Ni Hui created MXNET-97:

             Summary: implement DepthwiseConv2dBackwardFilterKernel from 
tensorflow codebase
                 Key: MXNET-97
             Project: Apache MXNet
          Issue Type: Improvement
            Reporter: Ni Hui

The current mxnet implementation calls __syncthreads() function too much, which 
is extemely slow.
The new code comes from tensorflow, but the variable names are adjusted for 

My model uses depthwise conv heavily, and now its training time per iteration 
is over 5x faster on single P40 gpu. ( old 92s vs new 18s )

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to