jiajinyu opened a new issue #11462: throughput of sparse linear classification is small with small batch size URL: https://github.com/apache/incubator-mxnet/issues/11462 ## Description For small batch, sparse linear classification uses all CPU, but throughput is small. ## Environment info (Required) Machine used: AWS AMI, c5.9xlarge, steps to repro: 1. pip2 install mxnet-mkl 2. git clone mxnet 3. in directory `incubator-mxnet/example/sparse/linear_classification`, run `python2 train.py --batch-size 1` We see throughput is around 600 samples/sec. I tried to set up things like `export OMP_NUM_THREADS=vCPUs / 2`. It seems that after setting this, the CPU usage reduces (only half of the cores are used), but the throughput is not reduced. This is even the case when I setting `OMP_NUM_THREADS=1`. ## Question How should I set things up to increase the throughput of the linear classfication training for a single machine with multiple cores? Or does MXNet currently not optimize in this direction (i.e. not using things like Hogwild!). Thanks in advance. with @lcytzk
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
