wy3406 opened a new issue #19498: URL: https://github.com/apache/incubator-mxnet/issues/19498
## Description (A clear and concise description of what the bug is.) - I have a few issues/questions regarding SyncBN When using BN training in custom image segmentation, the memory is normal. But when I replaced BN with SyncBN, I found that the GPU memory gradually increased with iteration until it occupied the entire GPU memory,then the training is stuck. I try to use a smaller batch than BN, which also takes up all the GPU memory. Note there is no warning when I use SyncBN. Is there something I have missed? - Environments: Python 3.6.9 ; TITAN RTX × 8;CUDA 10.1 - Framework: mxnet-cu101-1.7.0 and gluoncv-0.8.0 ### Error Message (Paste the complete error message. Please also include stack trace by setting environment variable `DMLC_LOG_STACK_TRACE_DEPTH=100` before running your script.) ## To Reproduce (If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.) ### Steps to reproduce (Paste the commands you ran that produced the error.) 1. 2. ## What have you tried to solve it? 1. 2. ## Environment ***We recommend using our script for collecting the diagnostic information with the following command*** `curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python3` <details> <summary>Environment Information</summary> ``` # Paste the diagnose.py command output here ``` </details> ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
