Neutron3529 opened a new issue #17870: batch transform with dataloader
URL: https://github.com/apache/incubator-mxnet/issues/17870
 
 
   ## Description
   I found a [performance 
regression](https://github.com/apache/incubator-mxnet/issues/15655) last year, 
which is due to the current strategy of executing what meet `transform_first`'s 
setting.
   
   The current choice is perform what meets `transform_first`'s setting on the 
data **before** collecting data into a batch, if batch_size=500, we have to 
execute what meets `transform_first`'s setting for 500 times within a batch, 
which is extremely ineffective.
   
   if we collecting the data first, and send the full data batch to what meet 
`transform_first`'s settings, the processing may be much faster.
   
   Here is the test result, 2 nets are provided to simulate when to perform 
what meets `transform_first`'s setting:
   ```
   Python 3.7.7 (tags/v3.7.7:d7c567b08f, Mar 10 2020, 10:41:24) [MSC v.1900 64 
bit (AMD64)] on win32
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import mxnet as mx
   >>> from mxnet.gluon.nn import Dense
   >>> import time
   >>> net=mx.gluon.nn.HybridSequential()#calculate before batchify
   >>> with net.name_scope():
   ...  net.add(Dense(10))
   ...  net.add(Dense(35))
   ...  net.add(Dense(10))
   ...
   >>> n2=mx.gluon.nn.HybridSequential()#calculate after batchify
   >>> with n2.name_scope():
   ...  n2.add(Dense(10))
   ...  n2.add(Dense(35))
   ...  n2.add(Dense(10))
   ...
   >>> a=[mx.nd.random.uniform(shape=(60,)) for i in range(500)]#data is 
mx.nd.random.uniform(shape=(60,)) with  batch_size=500
   >>> ctx=mx.cpu(0)
   >>> n2.initialize(mx.init.Uniform(), ctx=ctx,force_reinit=True)
   >>> net.initialize(mx.init.Uniform(), ctx=ctx,force_reinit=True)
   >>> for i in range(10):
   ...  ii=time.time()
   ...  _=mx.nd.stack(*(net(a) for a in a))#calculate before batchify, which is 
the default order of calculation which MXNet applies.
   ...  jj=time.time()-ii
   ...  ii=time.time()
   ...  b=mx.nd.stack(*a)#batchify
   ...  _=mx.nd.stack(n2(b))#calculate, which will be faster.
   ...  kk=time.time()-ii
   ...  print((jj,kk))
   ...
   (0.44376635551452637, 0.008976459503173828)
   (0.34108686447143555, 0.0029942989349365234)
   (0.48074865341186523, 0.002991914749145508)
   (0.3600330352783203, 0.0029861927032470703)
   (0.35801076889038086, 0.003988027572631836)
   (0.3530876636505127, 0.0029916763305664062)
   (0.4198474884033203, 0.0019931793212890625)
   (0.41489434242248535, 0.0019941329956054688)
   (0.3879544734954834, 0.003995418548583984)
   (0.419874906539917, 0.0029935836791992188)
   >>>#which means, batchify after transform is a totally waste of time.
   ```
   I tried to force a transform after batchify, but I cannot find any convinent 
apporaches since most of the built-in transform function is not designed to 
work after batchify (e.g., ), if I want to batchify before transform, I have to 
write an equivlent transform function myself.
   
   Actually I've submitted [what I 
found](https://github.com/apache/incubator-mxnet/issues/15655#issuecomment-515744051),
 but the problem still exists. It tooks extremely long time training CIFAR-10 
for me.
   (with num_worker=3, it may took a while, but with default num_workers=8, an 
OOM happened and crush the training step since each python.exe eats 2GB of my 
physical memory(loading libmxnet.dll tooks a lot of time and ~2G memory))

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to