lanking520 opened a new issue #15067: CachedOp performance regression
URL: https://github.com/apache/incubator-mxnet/issues/15067
 
 
   Recently I am running benchmark on the cachedOp performance and get some 
regression on the result. Please see the table below:
   
   |            | Module API | cachedOp with Static | CachedOp without static |
   |------------|------------|----------------------|-------------------------|
   | p2.8xlarge | 43ms       | 42ms                 | 51ms                    | 
   | p3.2xlarge | 11ms       | 19ms                 | 16ms                    | 
 
   | c5.4xlarge | 36ms       | 38ms                 | 42ms                    | 
   
   I would like to highlight the GPU performance comparison. You can see on P2 
there is a performance gain with the flag being set but regression in P3.
   ```
   imported_net.hybridize(static_alloc = True, static_shape = True)
   ```
   
   In theory, it is expected the performance boost if you set these two flags 
since memory is reused. However, on large GPU it seemed not performing fine.
   
   ## Benchmark Script
   ```python
   import mxnet as mx
   from mxnet import ndarray as nd
   import numpy as np
   import json, time, os
   from mxnet import gluon
   
   path='http://data.mxnet.io/models/imagenet/'
   [mx.test_utils.download(path+'resnet/152-layers/resnet-152-0000.params'),
   mx.test_utils.download(path+'resnet/152-layers/resnet-152-symbol.json'),
   mx.test_utils.download(path+'synset.txt')]
   
   
   def compute_stats(perf_results, results):
     results["average"] = np.average(perf_results)
     results['tp50'] = np.percentile(perf_results, 50)
     results['tp90'] = np.percentile(perf_results, 90)
     results['tp99'] = np.percentile(perf_results, 99)
   
   ctx_str = os.environ['BENCHMARK_CTX']
   
   if ctx_str == 'GPU':
     ctx = mx.gpu(0)
   elif ctx_str == 'CPU':
     ctx = mx.cpu()
   
   benchmark = {}
   
   prefix = 'resnet-152'
   
   # Model Partition time
   t1 = time.time()
   imported_net = gluon.nn.SymbolBlock.imports(prefix + '-symbol.json', 
['data', 'softmax_label'],
                                               prefix + '-0000.params')
   t2 = time.time()
   elapsed = (t2 - t1) * 1000
   
   imported_net.hybridize(static_alloc = True, static_shape = True)
   
   benchmark['ModelLoadTime'] = elapsed
   
   fname = 
mx.test_utils.download('https://github.com/dmlc/web-data/blob/master/mxnet/doc/tutorials/python/predict_image/cat.jpg?raw=true')
   img = mx.image.imread(fname)
   
   
   # convert into format (batch, RGB, width, height)
   img = mx.image.imresize(img, 300, 300) # resize
   img = img.transpose((2, 0, 1)) # Channel first
   img = img.expand_dims(axis=0) # batchify
   img = img.astype('float32')
   
   sf_label = nd.ones((1))
   
   if ctx_str == 'GPU':
     img = img.as_in_context(mx.gpu(0))
   
   # First Inference
   t1 = time.time()
   op = imported_net(img, sf_label)
   op.wait_to_read()
   t2 = time.time()
   elapsed = (t2 - t1) * 1000
   
   benchmark['FirstInferCall'] = elapsed
   
   times = 100
   time_cost = []
   
   for idx in range(0, times):
     t1 = time.time()
     op = imported_net(img, sf_label)
     op.wait_to_read()
     t2 = time.time()
     elapsed = (t2 - t1) * 1000
     time_cost.append(elapsed)
     print("time cost: ", elapsed, "ms")
   
   benchmark['ModelLoadTime'] = benchmark['FirstInferCall'] - time_cost[0]
   compute_stats(time_cost, benchmark)
   
   output = json.dumps(benchmark)
   
   f = open('Inf.json', 'w')
   f.write(output)
   f.close()
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to