fhieber commented on pull request #18531: URL: https://github.com/apache/incubator-mxnet/pull/18531#issuecomment-643804092
@eric-haibin-lin Thanks for the pointer. We had tried custom operators in a separate project (int8 quantization) but found them to be not suitable for performance reasons. @sxjscience Thanks! I will take a look. I haven't benchmarked SoftmaxOutput and a Gluon implementation recently. Mid 2019 we still observed a ~15-20% slowdown when not using SoftmaxOutput. I will give your label smoothing implementation a try, but my guess is that the combination of Softmax forward pass, (smoothed) cross-entropy gradient backward pass, and support for easy masking without creating an explicit mask, all implemented in a single operator, will still have a slight edge over an implementation with multiple operators. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
