[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #19385: Update MXNet-TRT docs with the new optimize_for API

GitBox Tue, 20 Oct 2020 17:19:31 -0700


samskalicky commented on a change in pull request #19385:
URL: https://github.com/apache/incubator-mxnet/pull/19385#discussion_r508916239




##########
File path: 
docs/python_docs/python/tutorials/performance/backend/tensorrt/tensorrt.md
##########
@@ -33,74 +33,92 @@ from mxnet.gluon.model_zoo import vision
 import time
 import os
 
+ctx=mx.gpu(0)
+
+path = 'resnet18_v2'
 batch_shape = (1, 3, 224, 224)
-resnet18 = vision.resnet18_v2(pretrained=True)
-resnet18.hybridize()
-resnet18.forward(mx.nd.zeros(batch_shape))
-resnet18.export('resnet18_v2')
-sym, arg_params, aux_params = mx.model.load_checkpoint('resnet18_v2', 0)
+x = mx.nd.zeros(batch_shape, ctx=ctx)
+
+model = vision.resnet18_v2(pretrained=True)
+model.hybridize(static_shape=True, static_alloc=True)
+model.collect_params().reset_ctx(ctx)
+
+[p.reset_ctx(ctx) for _,p in model.collect_params().items() if p]
 ```
-In our first section of code we import the modules needed to run MXNet, and to 
time our benchmark runs.  We then download a pretrained version of Resnet18, 
hybridize it, and load it symbolically.  It's important to note that the 
experimental version of TensorRT integration will only work with the symbolic 
MXNet API. If you're using Gluon, you must 
[hybridize](https://gluon.mxnet.io/chapter07_distributed-learning/hybridize.html)
 your computation graph and export it as a symbol before running inference.  
This may be addressed in future releases of MXNet, but in general if you're 
concerned about getting the best inference performance possible from your 
models, it's a good practice to hybridize.
+In our first section of code we import the modules needed to run MXNet, and to 
time our benchmark runs.  We then download a pretrained version of Resnet18. We 
hybridize (link to hybridization) it with static_alloc and static_shape to get 
the best performance.
 
 ## MXNet Baseline Performance
 ```python
-# Create sample input
-input = mx.nd.zeros(batch_shape)
-
-# Execute with MXNet
-executor = sym.simple_bind(ctx=mx.gpu(0), data=batch_shape, grad_req='null', 
force_rebind=True)
-executor.copy_params_from(arg_params, aux_params)
-
 # Warmup
-print('Warming up MXNet')
-for i in range(0, 10):
-    y_gen = executor.forward(is_train=False, data=input)
-    y_gen[0].wait_to_read()
+for i in range(0, 1000):
+       out = model(x)
+       out[0].wait_to_read()

Review comment:
       I prefer to have just `mx.nd.waitall()` instead of calling 
`wait_to_read()` for each output. But ill leave it up to you.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-mxnet] samskalicky commented on a change in pull request #19385: Update MXNet-TRT docs with the new optimize_for API

Reply via email to