amswly opened a new issue #20103: URL: https://github.com/apache/incubator-mxnet/issues/20103
## Description After the power cut, the gpu training speed on Symbol project dropped from 800+samples/s to 10 samples/s ### Error Message (Paste the complete error message. Please also include stack trace by setting environment variable `DMLC_LOG_STACK_TRACE_DEPTH=100` before running your script.) ## To Reproduce (If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.) ### Steps to reproduce (Paste the commands you ran that produced the error.) 1. 2. ## What have you tried to solve it? 1.I set kvstore = ‘local’ instead of 'decive', and the spread is about 600+samples/s now, but not quick as before.Does anyone has any solution for this problem. 2. ## Environment ***We recommend using our script for collecting the diagnostic information with the following command*** `curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python3` <details> <summary>Environment Information</summary> ``` # Paste the diagnose.py command output here ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
