KellenSunderland commented on issue #14684: When setting MXNET_CUDA_TENSOR_OP_MATH_ALLOW_CONVERSION no speedup observed URL: https://github.com/apache/incubator-mxnet/issues/14684#issuecomment-482669415 Auto-tuning is overwriting the math mode at convolution tuning time. Probably the right thing to do when implementing TCs but it's preventing the conversion math type from being used. We'll have to think about the long-term fix for this, but I've currently commented out the math type reset locally and I'm trying to verify this cudnn feature provides a significant speedup before moving forward. New output logs from the CuDNN Api logging look a bit happier: ``` I! CuDNN (v7301) function cudnnConvolutionForward() called: i! handle: type=cudnnHandle_t; streamId=0x7f12280023a0; i! alpha: type=CUDNN_DATA_FLOAT; val=1.000000; i! xDesc: type=cudnnTensorDescriptor_t: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! nbDims: type=int; val=4; i! dimA: type=int; val=[32,512,15,20]; i! strideA: type=int; val=[153600,300,20,1]; i! xData: location=dev; addr=0x7f1094000000; i! wDesc: type=cudnnFilterDescriptor_t: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! vect: type=int; val=0; i! nbDims: type=int; val=4; i! dimA: type=int; val=[2048,512,1,1]; i! format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NCHW (0); i! wData: location=dev; addr=0x7f1141000000; i! convDesc: type=cudnnConvolutionDescriptor_t: i! mode: type=cudnnConvolutionMode_t; val=CUDNN_CROSS_CORRELATION (1); i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION (2); i! arrayLength: type=int; val=2; i! padA: type=int; val=[0,0]; i! strideA: type=int; val=[1,1]; i! dilationA: type=int; val=[1,1]; i! groupCount: type=int; val=1; i! algo: type=cudnnConvolutionFwdAlgo_t; val=CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM (1); i! workSpace: location=dev; addr=0x7f10bc000000; i! workSpaceSizeInBytes: type=size_t; val=90572576; i! beta: type=CUDNN_DATA_FLOAT; val=0.000000; i! yDesc: type=cudnnTensorDescriptor_t: i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0); i! nbDims: type=int; val=4; i! dimA: type=int; val=[32,2048,15,20]; i! strideA: type=int; val=[614400,300,20,1]; i! yData: location=dev; addr=0x7f1096000000; i! Time: 2019-04-12T11:02:48.313026 (0d+0h+0m+9s since start) i! Process=12161; Thread=12212; GPU=0; Handle=0x7f1228118830; StreamId=0x7f12280023a0. ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
