Ishitori commented on a change in pull request #11274: [MXNET-547] Tutorial
explaining how to use the profiler
URL: https://github.com/apache/incubator-mxnet/pull/11274#discussion_r195571404
##########
File path: docs/tutorials/python/profiler.md
##########
@@ -0,0 +1,198 @@
+# Profiling MXNet Models
+
+It is often helpful to understand what operations take how much time while
running a model. This helps optimize the model to run faster. In this tutorial,
we will learn how to profile MXNet models to measure their running time and
memory consumption using the MXNet profiler.
+
+## The incorrect way to profile
+
+If you have just begun using MXNet, you might be tempted to measure the
execution time of your model using Python's `time` module like shown below:
+
+```python
+from time import time
+from mxnet import autograd, nd
+import mxnet as mx
+
+start = time()
+x = nd.random_uniform(shape=(2000,2000))
+y = nd.dot(x, x)
+print('Time for matrix multiplication: %f sec\n' % (time() - start))
+
+start = time()
+print(y.asnumpy())
+print('Time for printing the output: %f sec' % (time() - start))
+```
+
+
+**Time for matrix multiplication: 0.005051 sec**<!--notebook-skip-line-->
+
+[[501.1584 508.29724 495.65237 ... 492.84705 492.69092 490.0481
]<!--notebook-skip-line-->
+
+ [508.81058 507.1822 495.1743 ... 503.10526 497.29315
493.67917]<!--notebook-skip-line-->
+
+ [489.56598 499.47015 490.17722 ... 490.99945 488.05008
483.28836]<!--notebook-skip-line-->
+
+ ...<!--notebook-skip-line-->
+
+ [484.0019 495.7179 479.92142 ... 493.69952 478.89194 487.2074
]<!--notebook-skip-line-->
+
+ [499.64932 507.65094 497.5938 ... 493.0474 500.74512
495.82712]<!--notebook-skip-line-->
+
+ [516.0143 519.1715 506.354 ... 510.08878 496.35608
495.42523]]<!--notebook-skip-line-->
+
+**Time for printing the output: 0.167693 sec**<!--notebook-skip-line-->
+
+
+From the output above, it seems as if printing the output takes lot more time
that multiplying two large matrices. That doesn't feel right.
+
+This is because, in MXNet, all operations are executed asynchronously. So,
when `nd.dot(x, x)` returns, the matrix multiplication is not complete, it has
only been queued for execution. `asnumpy` in `print(y.asnumpy())` however,
waits for the result to be computed and hence takes longer time.
+
+While it is possible to use `NDArray.waitall()` before and after operations to
get running time of operations, it is not a scalable method to measure running
time of multiple sets of operations, especially in a Sequential or Hybrid
network.
+
+## The correct way to profile
+
+The correct way to measure running time of MXNet models is to use MXNet
profiler. In the rest of this tutorial, we will learn how to use the MXNet
profiler to measure the running time and memory consumption of MXNet models.
+
+To use the profiler, you need to build MXNet with `USE_PROFILER` enabled. For
example, this command will build the CPU version of MXNet on Linux,
+
+```
+make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_PROFILER=1
Review comment:
It would be useful to provide GPU version as well, as I assume many people
would want to use it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services