DickJC123 commented on a change in pull request #13749: Add NHWC layout support
to Pooling (cpu, gpu cuda, gpu cuDNN)
URL: https://github.com/apache/incubator-mxnet/pull/13749#discussion_r254117531
##########
File path: src/operator/nn/pool_utils.h
##########
@@ -98,14 +98,16 @@ struct lp_grad<DType, 1> {
template<typename DType>
struct lp_grad<DType, 2> {
static MSHADOW_XINLINE DType Map(const DType grad, const DType in_data,
const DType out_data) {
- return grad * in_data / out_data;
+ // Avoid nan result if both grad and out_data are 0.
+ return (grad == DType(0.0)) ? DType(0.0) : grad * in_data / out_data;
}
};
template<typename DType>
struct lp_grad<DType, 3> {
static MSHADOW_XINLINE DType Map(const DType grad, const DType in_data,
const DType out_data) {
- return grad * in_data * in_data / (out_data * out_data);
+ // Avoid nan result if both grad and out_data are 0.
+ return (grad == DType(0.0)) ? DType(0.0) : grad * in_data * in_data /
(out_data * out_data);
Review comment:
I've pushed my solution to your comment in commit
https://github.com/apache/incubator-mxnet/pull/13749/commits/098bc49f1d288ea9f2b64453aefcc1537ca5254e.
The checking of grad == 0.0 that you highlighted only succeeded because of
the quirks of our check_consistency() routine in test_utils.py, which uses the
symbol forward() output as the gradient. Per your suggestion, I'm now using a
check of out_data == 0 as the more general way of quieting the test failures.
The test failures I was seeing often occurred in float16 lp-3 pooling. By
example, take the case where a pool window of 2 has identical inputs 2^-9 and
2^-9. The forward output for this case is the cube root of (2^-9)^3 +
(2^-9)^3. If this is calculated in float16, the 2^-27 terms underflow to 0 and
the output is 0. The backward output is then grad * 2^-9 * 2^-9 / (0 * 0) =
+inf (or nan if grad is also 0). When performed in float32, no underflow
occurs in the forward op, and +infs are avoided in the backward op.
My conclusion: float16 is ill-equipped to perform the forward pooling
operation for lp-2 and lp-3. Part of my solution here thus involves promoting
the calculation to be in float32 for cpu and mxnet cuda implementations of
float16-i/o pooling. This is consistent with other operators like float16-i/o
Convolution and Batchnorm, which perform internal calculations in float32.
I've run test_pooling_versions() thousands of times with no failures in this
mode.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services