nswamy closed pull request #12230: Update Operator Implementation Tutorial URL: https://github.com/apache/incubator-mxnet/pull/12230
This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/faq/add_op_in_backend.md b/docs/faq/add_op_in_backend.md index 03d77510aac..2fcc678b8ca 100644 --- a/docs/faq/add_op_in_backend.md +++ b/docs/faq/add_op_in_backend.md @@ -55,7 +55,7 @@ Next, we are going to for registering `a`, `b`, and `c` in `quadratic_op-inl.h`. 2. Define type and shape inference functions in `quadratic_op-inl.h`. 3. Define forward and backward functions in `quadratic_op-inl.h`. -4. Register the operator using [nnvm](https://github.com/dmlc/nnvm) +4. Register the operator using [nnvm](https://docs.tvm.ai/dev/nnvm_overview.html) in `quadratic_op.cc` and `quadratic_op.cu` for CPU and GPU computing, respectively. @@ -318,7 +318,10 @@ parallel computation on both CPU and GPU. This allows most of the simple operators to share the same piece of code for CPU and GPU as parallelization approaches are often identical on both types of devices. The kernel function is defined as the following, where the function -`Map` is executed by each thread for each input element. To explain a little +`Map` is executed by each thread for each input element. The `out_data.Size()`, +in the `Kernel::Launch` function corresponds to the factor by which the +workload will get parallelized among the different threads, which here +corresponds to the size of the output array. To explain a little bit more on the two macros used in the kernel struct: (1) `MSHADOW_XINLINE` is a consolidated macro for inlining functions compiled by both CPU and GPU compilers. It enables CPU and GPU computing to share the same piece of code. @@ -543,22 +546,21 @@ If you use python, when you type `import mxnet as mx`, two python functions for invoking your backend implementation are generated on the fly: one is for imperative programming registered as `mxnet.ndarray.quadratic` or `mx.nd.quadratic` for short; -the other one is for symbolic -programming registered under module `mxnet.symbol.quadratic` -or `mx.sym.quadratic` for short. +the other one is for symbolic programming registered under +module `mxnet.symbol.quadratic` or `mx.sym.quadratic` for short. In order to unit test it in frontend, we need to add the following code -to the python file `test_operator.py`. Note that while testing the -forward pass is straightforward using `mx.nd.quadratic`, testing -the backward involves a bit of more efforts. We create a -`quadratic` symbol and feed it into the utility function `check_numeric_gradient`. -The utility function will perform a perturbation on the input -and calculate the response rate of the output using the -[finite difference method](https://en.wikipedia.org/wiki/Finite_difference_method). -Then it will compare the gradient from the backward pass with -the values from the finite difference method. The test -will be successful once the comparison satisfies user specified -relative and absolute thresholds. +to the python file `test_operator.py`. A typical operator implementation +tests for both the `symbol` API and the `ndarray` API. The following test +has both these tests. The imperative API test, tests for the `ndarray` API, +`mx.nd.contrib.quadratic`. The `symbol` API test, tests for the complete +functionality of the operator - the forward pass and the backward +pass. To facilitate the testing of these functionalities we use three +helper functions available in the `mxnet.test_utils` module: + - `check_symbolic_forward` + - `check_symbolic_backward` + - `check_numeric_gradient` + ```python def test_quadratic_function(): def f(x, a, b, c): @@ -567,29 +569,72 @@ def test_quadratic_function(): a = np.random.random_sample() b = np.random.random_sample() c = np.random.random_sample() - for ndim in range(1, 6): - # check forward - shape = rand_shape_nd(ndim, 5) - data = rand_ndarray(shape=shape, stype='default') - data_np = data.asnumpy() - expected = f(data_np, a, b, c) - output = mx.nd.quadratic(data, a=a, b=b, c=c) - assert_almost_equal(output.asnumpy(), expected) - - # check backward using finite difference - data = mx.sym.Variable('data') - quad_sym = mx.sym.quadratic(data=data, a=a, b=b, c=c) - check_numeric_gradient(quad_sym, [data_np]) + data = mx.symbol.Variable('data') + quad_sym = mx.sym.contrib.quadratic(data=data, a=a, b=b, c=c) + for dtype in [np.float16, np.float32, np.float64]: + for ndim in range(1, 6): + shape = rand_shape_nd(ndim, 5) + data_np = np.random.randn(*shape).astype(dtype) + expected = f(data_np, a, b, c) + backward_expected = 2 * a * data_np + b + + # check imperative forward + output = mx.nd.contrib.quadratic(mx.nd.array(data_np), a=a, b=b, c=c) + assert_almost_equal(output.asnumpy(),expected, + rtol=1e-2 if dtype is np.float16 else 1e-5, + atol=1e-2 if dtype is np.float16 else 1e-5) + # check forward + check_symbolic_forward(quad_sym, [data_np], [expected], + rtol=1e-2 if dtype is np.float16 else 1e-5, + atol=1e-2 if dtype is np.float16 else 1e-5) + # check backward + check_symbolic_backward(quad_sym, [data_np], [np.ones(expected.shape)], + [backward_expected], + rtol=1e-2 if dtype is np.float16 else 1e-5, + atol=1e-2 if dtype is np.float16 else 1e-5) + # check backward using finite difference + check_numeric_gradient(quad_sym, [data_np], atol=0.001) +``` + +In the above test we create a `quadratic` symbol and feed it into the three +utility functions. The `check_symbolic_forward` and `check_symbolic_backward` +tests the computed values against the expected values that we pass +as an argument to the function. The `check_numeric_gradient` utility function +performs [gradient checking](http://ufldl.stanford.edu/tutorial/supervised/DebuggingGradientChecking/) +to verify the implementation for the backward function of the operator. +It will perform a perturbation on the input and calculate the response +rate of the output using the +[finite difference method](https://en.wikipedia.org/wiki/Finite_difference_method). +Then it will compare the gradient from the backward pass with the values +from the finite difference method. All three of these tests will be successful +once the comparison satisfies user specified `rtol` and `atol` values. Here `rtol` +and `atol` expand to relative tolerance and absolute tolerance respectively. They +are used to specify how far the computed values can deviate from the expected values. +They are defined as follows ``` -Note that here we used `mx.nd.quadratic` to test the forward function -and `check_numeric_gradient` to test the backward function. In MXNet, -two other utility functions are also commonly used: `check_symbolic_forward` -and `check_symbolic_backward`. By using them in unit tests, -users need to pass in the operator symbols and expected results -for comparison. Please also note that -we highly recommend adding `check_numeric_gradient` test for every operator -with backward function implemented as it eliminates the possibility -of passing incorrect expected results into `check_symbolic_backward`. +abs(Expected_Value - Computed_Value) < RTOL * abs(Expected_Value) + ATOL +``` + +For example, if `rtol` is `1e-5` and `atol` is `1e-5` and the expected value is +`1.5623145`, then the computed value should lie within the range of +`(1.562288876855, 1.562340123145)` else the test will fail. Make sure you +tune the `rtol` and `atol` values accordingly. Giving very low values for `rtol` +and `atol` will likely make the test very flaky. It is recommended that you +use the flakiness checker tool to check if the test you have written is flaky +or not. You can run the flakiness checker tool for the above test with the +following command - + +```bash +python tools/flakiness_checker.py test_operator.test_quadratic_function +``` + +Please note that for `check_symbolic_forward` and `check_symbolic_backward` we pass +both the operator symbols and expected results for comparison, for +`check_numeric_gradient` we only pass the operator symbol, as the +`check_numeric_gradient` computes the expected value using finite difference +method. Which is why it is highly recommended to add `check_numeric_gradient` +test for every operator with backward function implemented as it eliminates +the possibility of passing incorrect expected results into `check_symbolic_backward`. ## Summary @@ -601,8 +646,8 @@ using nnvm. Congratulations! You now know how to add operators. We welcome your contributions to MXNet. **Note**: Source code in the tutorial can be found in -[quadratic_op-inl.h](https://github.com/reminisce/mxnet/blob/add_op_example_for_tutorial/src/operator/tensor/quadratic_op-inl.h), -[quadratic_op.cc](https://github.com/reminisce/mxnet/blob/add_op_example_for_tutorial/src/operator/tensor/quadratic_op.cc), -[quadratic_op.cu](https://github.com/reminisce/mxnet/blob/add_op_example_for_tutorial/src/operator/tensor/quadratic_op.cu), +[quadratic_op-inl.h](https://github.com/apache/incubator-mxnet/blob/master/src/operator/contrib/quadratic_op-inl.h), +[quadratic_op.cc](https://github.com/apache/incubator-mxnet/blob/master/src/operator/contrib/quadratic_op.cc), +[quadratic_op.cu](https://github.com/apache/incubator-mxnet/blob/master/src/operator/contrib/quadratic_op.cu), and -[test_operator.py](https://github.com/reminisce/mxnet/blob/add_op_example_for_tutorial/tests/python/unittest/test_operator.py#L4008). +[test_operator.py](https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_operator.py#L6514). diff --git a/tests/python/unittest/test_operator.py b/tests/python/unittest/test_operator.py index a7f484e81b3..c3e122b1c18 100644 --- a/tests/python/unittest/test_operator.py +++ b/tests/python/unittest/test_operator.py @@ -6418,19 +6418,31 @@ def f(x, a, b, c): a = np.random.random_sample() b = np.random.random_sample() c = np.random.random_sample() - # check forward - for ndim in range(1, 6): - shape = rand_shape_nd(ndim, 5) - data = rand_ndarray(shape=shape, stype='default') - data_np = data.asnumpy() - expected = f(data_np, a, b, c) - output = mx.nd.contrib.quadratic(data, a=a, b=b, c=c) - assert_almost_equal(output.asnumpy(), expected, rtol=0.001, atol=0.0001) - - # check backward using finite difference - data = mx.sym.Variable('data') - quad_sym = mx.sym.contrib.quadratic(data=data, a=a, b=b, c=c) - check_numeric_gradient(quad_sym, [data_np], atol=0.001) + data = mx.symbol.Variable('data') + quad_sym = mx.sym.contrib.quadratic(data=data, a=a, b=b, c=c) + for dtype in [np.float16, np.float32, np.float64]: + for ndim in range(1, 6): + shape = rand_shape_nd(ndim, 5) + data_np = np.random.randn(*shape).astype(dtype) + expected = f(data_np, a, b, c) + backward_expected = 2 * a * data_np + b + + # check imperative forward + output = mx.nd.contrib.quadratic(mx.nd.array(data_np), a=a, b=b, c=c) + assert_almost_equal(output.asnumpy(),expected, + rtol=1e-2 if dtype is np.float16 else 1e-5, + atol=1e-2 if dtype is np.float16 else 1e-5) + # check forward + check_symbolic_forward(quad_sym, [data_np], [expected], + rtol=1e-2 if dtype is np.float16 else 1e-5, + atol=1e-2 if dtype is np.float16 else 1e-5) + # check backward + check_symbolic_backward(quad_sym, [data_np], [np.ones(expected.shape)], + [backward_expected], + rtol=1e-2 if dtype is np.float16 else 1e-5, + atol=1e-2 if dtype is np.float16 else 1e-5) + # check backward using finite difference + check_numeric_gradient(quad_sym, [data_np], atol=0.001) @with_seed() ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
