ChaiBapchya opened a new pull request #17735: Fix OpPerf in Master
URL: https://github.com/apache/incubator-mxnet/pull/17735
 
 
   ## Description ##
   
   Change 1
   After merging PR #17449 and #17400 
   refactor of optimizer was incomplete due to both PRs not knowing changes 
made by each other.
   
   While #17449 added set of variables for large tensor, #17400 refactored 2 
variables from gamma1 gamma2 to rho and momentum
   
   Fixing that conflict here
   
   Change 2
   Upon running entire opperf suite for CUDA=ON, CUDNN=ON, it was found 
BatchNorm fails here
   ```
   <function BatchNorm at 0x7f46869e3bf8>
   Traceback (most recent call last):
     File "incubator-mxnet/benchmark/opperf/opperf.py", line 213, in <module>
       sys.exit(main())
     File "incubator-mxnet/benchmark/opperf/opperf.py", line 193, in main
       benchmark_results = run_all_mxnet_operator_benchmarks(ctx=ctx, 
dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, 
runs=runs)
     File "incubator-mxnet/benchmark/opperf/opperf.py", line 99, in 
run_all_mxnet_operator_benchmarks
       
mxnet_operator_benchmark_results.append(run_nn_basic_operators_benchmarks(ctx=ctx,
 dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, 
runs=runs))
     File 
"/home/ubuntu/incubator-mxnet/benchmark/opperf/nd_operations/nn_basic_operators.py",
 line 143, in run_nn_basic_operators_benchmarks
       mx_nn_basic_op_results = run_op_benchmarks(mx_nn_basic_ops, dtype, ctx, 
profiler, int64_tensor, warmup, runs)
     File 
"/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 
211, in run_op_benchmarks
       warmup=warmup, runs=runs)
     File 
"/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 
178, in run_performance_test
       benchmark_result = _run_nd_operator_performance_test(op, inputs, 
run_backward, warmup, runs, kwargs_list, profiler)
     File 
"/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py", line 
115, in _run_nd_operator_performance_test
       _, _ = benchmark_helper_func(op, warmup, **kwargs_list[0])
     File 
"/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/profiler_utils.py", line 
200, in cpp_profile_it
       res = func(*args, **kwargs)
     File 
"/home/ubuntu/incubator-mxnet/benchmark/opperf/utils/ndarray_utils.py", line 
60, in nd_forward_backward_and_profile
       nd.waitall()
     File "/home/ubuntu/incubator-mxnet/python/mxnet/ndarray/ndarray.py", line 
206, in waitall
       check_call(_LIB.MXNDArrayWaitAll())
     File "/home/ubuntu/incubator-mxnet/python/mxnet/base.py", line 246, in 
check_call
       raise get_last_ffi_error()
   mxnet.base.MXNetError: Traceback (most recent call last):
     File 
"/home/ubuntu/incubator-mxnet/src/operator/nn/./cudnn/cudnn_batch_norm-inl.h", 
line 62
   MXNetError: Check failed: param.eps >= 1e-5 (1e-08 vs. 1e-05) : CuDNN 
requires eps to be no less than 1e-05
   ```
   
   ## Checklist ##
   ### Essentials ###
   Please feel free to remove inapplicable items for your PR.
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage:
   - [ ] Code is well-documented: 
   - [ ] To the best of my knowledge, examples are either not affected by this 
change, or have been fixed to be compatible with this change
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to