ChaiBapchya opened a new pull request #17735: Fix OpPerf in Master URL: https://github.com/apache/incubator-mxnet/pull/17735 ## Description ## Change 1 After merging PR #17449 and #17400 refactor of optimizer was incomplete due to both PRs not knowing changes made by each other. While #17449 added set of variables for large tensor, #17400 refactored 2 variables from gamma1 gamma2 to rho and momentum Fixing that conflict here Change 2 Upon running entire opperf suite for CUDA=ON, CUDNN=ON, it was found BatchNorm fails here ``` <function BatchNorm at 0x7f46869e3bf8> Traceback (most recent call last): File "incubator-mxnet/benchmark/opperf/opperf.py", line 213, in <module> sys.exit(main()) File "incubator-mxnet/benchmark/opperf/opperf.py", line 193, in main benchmark_results = run_all_mxnet_operator_benchmarks(ctx=ctx, dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, runs=runs) File "incubator-mxnet/benchmark/opperf/opperf.py", line 99, in run_all_mxnet_operator_benchmarks mxnet_operator_benchmark_results.append(run_nn_basic_operators_benchmarks(ctx=ctx, dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, runs=runs)) File "/home/ubuntu/incubator-mxnet/benchmark/opperf/nd_operations/nn_basic_operators.py", line 143, in run_nn_basic_operators_benchmarks mx_nn_basic_op_results = run_op_benchmarks(mx_nn_basic_ops, dtype, ctx, profiler, int64_tensor, warmup, runs) File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 211, in run_op_benchmarks warmup=warmup, runs=runs) File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 178, in run_performance_test benchmark_result = _run_nd_operator_performance_test(op, inputs, run_backward, warmup, runs, kwargs_list, profiler) File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 115, in _run_nd_operator_performance_test _, _ = benchmark_helper_func(op, warmup, **kwargs_list[0]) File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/profiler_utils.py", line 200, in cpp_profile_it res = func(*args, **kwargs) File "/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/ndarray_utils.py", line 60, in nd_forward_backward_and_profile nd.waitall() File "/home/ubuntu/incubator-mxnet/python/mxnet/ndarray/ndarray.py", line 206, in waitall check_call(_LIB.MXNDArrayWaitAll()) File "/home/ubuntu/incubator-mxnet/python/mxnet/base.py", line 246, in check_call raise get_last_ffi_error() mxnet.base.MXNetError: Traceback (most recent call last): File "/home/ubuntu/incubator-mxnet/src/operator/nn/./cudnn/cudnn_batch_norm-inl.h", line 62 MXNetError: Check failed: param.eps >= 1e-5 (1e-08 vs. 1e-05) : CuDNN requires eps to be no less than 1e-05 ``` ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - [ ] Code is well-documented: - [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
