Neutron3529 edited a comment on issue #15655: Performance regression for gluon 
dataloader with large batch size
URL: 
https://github.com/apache/incubator-mxnet/issues/15655#issuecomment-515700522
 
 
   > 
   > 
   > Data loaders are different from Data iterators.
   > Data iterators first load all the data in to the memory before iterating 
it, while data loaders load data into memory as they iterate the data, hence 
they are slower than data iterators.
   > 
   > **So why we use data loaders if data iterators are faster?**
   > Some times when we deal with massive datasets that can't be loaded into 
memory, so for these data sets data iterators wouldn't work.
   > In practice we should use data iterators if our dataset can be loaded into 
memory (for example small datasets), otherwise we have to use data loaders.
   > For more info regarding **Data iterators Vs Data loaders** check 
[this](https://mxnet.incubator.apache.org/versions/master/architecture/note_data_loading.html)
 out.
   
   Maybe you are right.
   BUT... Did you test how slow the DataLoader could be?
   ```
   import mxnet as mx
   from mxnet import nd
   def data_xform(data):
       """Move channel axis to the beginning, cast to float32, and normalize to 
[0, 1]."""
       return nd.moveaxis(data, 2, 0).astype('float32') / 255
   
   train_data = 
mx.gluon.data.vision.MNIST(train=True).transform_first(data_xform)
   val_data = 
mx.gluon.data.vision.MNIST(train=False).transform_first(data_xform)
   batch_size = 100#set to 10000 produce the same result.
   train_loader = mx.gluon.data.DataLoader(train_data, shuffle=True, 
batch_size=batch_size)
   val_loader = mx.gluon.data.DataLoader(val_data, shuffle=False, 
batch_size=batch_size)
   for i,j in train_loader:
     pass
   ```
   It took 18s for me to finish the final loop. (with both mxnet-cu100mkl and 
mxnet-mkl tested)
   It is not the data loader **slower** than data iter

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to