[GitHub] [incubator-mxnet] zhreshold commented on issue #15655: Performance regression for gluon dataloader with large batch size

GitBox Sat, 27 Jul 2019 11:18:24 -0700

zhreshold commented on issue #15655: Performance regression for gluon 
dataloader with large batch size
URL: 
https://github.com/apache/incubator-mxnet/issues/15655#issuecomment-515703642
 
 
   @Neutron3529  
   
   Unfortunately for DataLoader, it has to allocate additional memory as you 
iterate through the dataset, and it will involve mx.nd.stack operator to batch 
images, which means mxnet engine will take control. In comparison, NDArrayIter 
or pure numpy array iteration won't trigger additional overheads.
   
   This problem is rather visible to small workloads, i.e., for MNIST. However, 
for large network training, several seconds is merely nothing compared to per 
epoch training or validation time (mins). 
   In fact if you have multi-core cpu you can speed up the process by utilizing 
multi_worker in this case
   
   ```python
   import mxnet as mx
   from mxnet import nd
   import time
   
   def data_xform(data):
       """Move channel axis to the beginning, cast to float32, and normalize to 
[0, 1]."""
       return nd.moveaxis(data, 2, 0).astype('float32') / 255
   
   def bench_time(num_workers=0):
       print('-----\nnum_workers:', num_workers)
       tic = time.time()
       train_data = 
mx.gluon.data.vision.MNIST(train=True).transform_first(data_xform)
       val_data = 
mx.gluon.data.vision.MNIST(train=False).transform_first(data_xform)
       t1 = time.time() - tic
       tic = time.time()
       batch_size = 100#set to 10000 produce the same result.
       train_loader = mx.gluon.data.DataLoader(train_data, shuffle=True, 
batch_size=batch_size, num_workers=num_workers)
       val_loader = mx.gluon.data.DataLoader(val_data, shuffle=False, 
batch_size=batch_size, num_workers=num_workers)
       for i,j in train_loader:
           pass
       t2 = time.time() - tic
       print('t1', t1, 't2', t2)
   
   if __name__ == '__main__':
       bench_time(0)
       bench_time(4)
   
   ```
   
   ```bash
   -----
   num_workers: 0
   t1 0.35317301750183105 t2 8.19723916053772
   -----
   num_workers: 4
   t1 0.2771739959716797 t2 3.3613219261169434
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] zhreshold commented on issue #15655: Performance regression for gluon dataloader with large batch size

Reply via email to