Neutron3529 commented on issue #15655: Performance regression for gluon dataloader with large batch size URL: https://github.com/apache/incubator-mxnet/issues/15655#issuecomment-515709968 Thanks for your reply. Maybe I should choose Linux rather than windows since windows has no `fork` and it will take even longer time to execute the benchmark.. ``` ----- num_workers: 6 t1 0.628288745880127 t2 28.964709043502808 ----- num_workers: 5 t1 0.6462414264678955 t2 26.153035402297974 ----- num_workers: 4 t1 0.6172983646392822 t2 23.17400550842285 ----- num_workers: 3 t1 0.608367919921875 t2 19.690324306488037 ----- num_workers: 2 t1 0.6791801452636719 t2 18.883481979370117 ----- num_workers: 1 t1 0.6682109832763672 t2 25.414013624191284 ``` I close this issue since with small dataset, I could store it in my GPU directly and my GPU could not support a larger dataset since it is quite slow.. Anyway, I hope I could find a faster dataloader in the future. BTW, I find it is too slow to import mxnet pack since the mxnet in `.whl` file contains a huge amount of `CUDA_ARCH`es. I asked how to remove the unnecessary `ARCH`, and find `nvprune` could do such thing. The question is, the `libmxnet.dll` I have is not relocatable so nvprune failed to strip the unnecessary `ARCH`es. I know that `libmxnet.lib` store all the symbols of `libmxnet.dll`, but I don't know how to merge them together. I wonder if it is possible using `nvprune` and get a smaller `libmxnet.dll` from a `.whl` file. If it is possible, we will import mxnet much faster.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
