bgawrych opened a new pull request #20894:
URL: https://github.com/apache/incubator-mxnet/pull/20894


   ## Description ##
   This change prevents MXNet from allocating additional memory space for 
gradients in quantized model as it can't be used anyway.
   
   Memory measurement script:
   ```
   import mxnet as mx
   from mxnet.gluon.model_zoo import vision
   import psutil
   import os
   
   def get_process_memory():
       process = psutil.Process(os.getpid())
       mem_info = process.memory_info()
       return mem_info.rss * 1e-6
   
   
   batch_shape = (1, 3, 224, 224)
   data = mx.np.random.normal(size=batch_shape)
   
   print("memory before loading model: ", get_process_memory())
   net = vision.resnet50_v1(pretrained=True)
   print("memory after loading model: ", get_process_memory())
   out = net(data)
   out.wait_to_read()
   print("memory after fp32 forward pass", get_process_memory())
   
   dataset = mx.gluon.data.ArrayDataset(data)
   data_loader = mx.gluon.data.DataLoader(dataset, batch_size=1)
   net_quantized = mx.contrib.quant.quantize_net(net, quantized_dtype='int8',
                                                   quantize_mode="smart",
                                                   calib_mode='naive',
                                                   calib_data=data_loader,
                                                   num_calib_batches=1,
                                                   ctx=mx.current_context())
   
   print("memory after quantization: ", get_process_memory())
   
   outputs = net_quantized(data)
   outputs.wait_to_read()
   print("memory after int8 forward pass: ", get_process_memory())
   ```
   **Output before:**
   ```
   memory before loading model:  213.430272
   [15:14:11] ../src/storage/storage.cc:202: Using Pooled (Naive) 
StorageManager for CPU
   memory after loading model:  530.702336
   memory after fp32 forward pass 611.241984
   /home/bg/work/MXNet/python/mxnet/gluon/block.py:1918: UserWarning: Cannot 
decide type for the following arguments. Consider providing them as input:
           data: None
     input_sym_arg_type = in_param.infer_type()[0]
   /home/bg/work/MXNet/python/mxnet/gluon/block.py:1251: UserWarning: 
register_op_hook is experimental when static_alloc=True / static_shape=True  
and may not work correctly
     warnings.warn("register_op_hook is experimental when static_alloc=True / 
static_shape=True "
   memory after quantization:  1064.57088
   memory after int8 forward pass:  1071.005696
   ```
   
   **Output after:**
   ```
   memory before loading model:  214.28633599999998
   [15:13:17] ../src/storage/storage.cc:202: Using Pooled (Naive) 
StorageManager for CPU
   memory after loading model:  531.2593919999999
   memory after fp32 forward pass 609.513472
   /home/bg/work/MXNet/python/mxnet/gluon/block.py:1918: UserWarning: Cannot 
decide type for the following arguments. Consider providing them as input:
           data: None
     input_sym_arg_type = in_param.infer_type()[0]
   /home/bg/work/MXNet/python/mxnet/gluon/block.py:1251: UserWarning: 
register_op_hook is experimental when static_alloc=True / static_shape=True  
and may not work correctly
     warnings.warn("register_op_hook is experimental when static_alloc=True / 
static_shape=True "
   memory after quantization:  890.273792
   memory after int8 forward pass:  895.2258559999999
   ```
   
   Significant memory usage reduction can be observed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to