Neutron3529 edited a comment on issue #15655: Performance regression for gluon dataloader with large batch size URL: https://github.com/apache/incubator-mxnet/issues/15655#issuecomment-515700522 > > > Data loaders are different from Data iterators. > Data iterators first load all the data in to the memory before iterating it, while data loaders load data into memory as they iterate the data, hence they are slower than data iterators. > > **So why we use data loaders if data iterators are faster?** > Some times when we deal with massive datasets that can't be loaded into memory, so for these data sets data iterators wouldn't work. > In practice we should use data iterators if our dataset can be loaded into memory (for example small datasets), otherwise we have to use data loaders. > For more info regarding **Data iterators Vs Data loaders** check [this](https://mxnet.incubator.apache.org/versions/master/architecture/note_data_loading.html) out. Maybe you are right. BUT... Did you test how slow the DataLoader could be? ``` import mxnet as mx from mxnet import nd def data_xform(data): """Move channel axis to the beginning, cast to float32, and normalize to [0, 1].""" return nd.moveaxis(data, 2, 0).astype('float32') / 255 train_data = mx.gluon.data.vision.MNIST(train=True).transform_first(data_xform) val_data = mx.gluon.data.vision.MNIST(train=False).transform_first(data_xform) batch_size = 100#set to 10000 produce the same result. train_loader = mx.gluon.data.DataLoader(train_data, shuffle=True, batch_size=batch_size) val_loader = mx.gluon.data.DataLoader(val_data, shuffle=False, batch_size=batch_size) for i,j in train_loader: pass ``` It took 18s for me to finish the final loop. (with both mxnet-cu100mkl and mxnet-mkl tested) It is not the data loader **slower** than data iter
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
