Neutron3529 commented on issue #15655: Performance regression for gluon 
dataloader with large batch size
URL: 
https://github.com/apache/incubator-mxnet/issues/15655#issuecomment-515709968
 
 
   Thanks for your reply.
   
   Maybe I should choose Linux rather than windows since windows has no `fork` 
and it will take even longer time to execute the benchmark..
   ```
   -----
   num_workers: 6
   t1 0.628288745880127 t2 28.964709043502808
   -----
   num_workers: 5
   t1 0.6462414264678955 t2 26.153035402297974
   -----
   num_workers: 4
   t1 0.6172983646392822 t2 23.17400550842285
   -----
   num_workers: 3
   t1 0.608367919921875 t2 19.690324306488037
   -----
   num_workers: 2
   t1 0.6791801452636719 t2 18.883481979370117
   -----
   num_workers: 1
   t1 0.6682109832763672 t2 25.414013624191284
   ```
   I close this issue since with small dataset, I could store it in my GPU 
directly and my GPU could not support a larger dataset since it is quite slow..
   
   Anyway, I hope I could find a faster dataloader in the future.
   
   BTW, I find it is too slow to import mxnet pack since the mxnet in `.whl` 
file contains a huge amount of `CUDA_ARCH`es. I asked how to remove the 
unnecessary `ARCH`, and find `nvprune` could do such thing.
   
   The question is, the `libmxnet.dll` I have is not relocatable so nvprune 
failed to strip the unnecessary `ARCH`es. I know that `libmxnet.lib` store all 
the symbols of `libmxnet.dll`, but I don't know how to merge them together.
   
   I wonder if it is possible using `nvprune` and get a smaller `libmxnet.dll` 
from a `.whl` file.
   If it is possible, we will import mxnet much faster.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to